string - how to encode a sequence of symbols to numerical form without losing information? -
i want use neural network classify strings. 'problem' neural networks accept numerical input, need method of encoding string numerical vector. there standard way of approaching such problem?
i thinking counting n-grams, approach result in huge feature vectors if don't want lose information, since i'd need compute 1-grams length-of-string-grams. right?
so, there more compact method of encoding strings numerical data? 1 maintains information both frequency of symbols , order?
might searching for? https://code.google.com/p/word2vec/
you vectorize words (and symbols) word2vec, add vectors (or subtract them when facing negation), divide result number of words added "build string" kind of vectorial mean scale back. have not tested tool yet.
you asked question frequency , order of words in strings. think order might flushed away technique, not count.
Comments
Post a Comment