Given vectors that represent words, how do we construct sentences? Do we add the vectors? Do we find centroids? Do we normalize before, after or not at all?
In fact, can we even say we are dealing with a vector space?
Remember, a vector space has the following 8 properties:
- Identity (of addition and multiplication)
- Distributivity (of scalars and vectors)
- Addition also has commutivity and associativity plus an inverse.
- Compatibilty: (ab)v = a(bv)
"The commutativity property of vector addition does not always hold in semantics. Therefore, this property shouldn't (always) hold in the embedding space either. Thus, the embedding space should not be called a vector space.
E.g. attempt to treat semantic composition as vector addition in the vector space:
vrescue dog = v rescue+ vdog (a dog which is trained to rescue people)
vdog rescue = vdog+ vrescue (the operation of saving a dog)
The phrases "rescue dog" and "dog rescue" mean different things, but in our hypothetical vector space, they would (incorrectly) have the same vector representation (due to commutativity).
Similarly for the associativity property."
At the same time, erroneous assumptions are not necessarily unacceptable (as the post points out). It's just a high-bias model.
For fun, I tried the vectors that Word2Vec gave me. Now, there is no reason I could think of why the vectors this algorithm gives me for words should be used to form a sentence. But the results were surprising.
|Raw vector addition||81.0|
|Normalized vector addition||27.9|
|Raw vector centroids||7|
|Raw vector addition then normalizing||7|
That is, adding together Word2Vec generated word vectors to make a sentence meant my neural net produced decent (but not stellar) results.
More promising was combining vectors from category space. The results looked like this:
|Normalized vector addition||94.0|
|Normalized vector centroids||92.1|
|Adding unnormalized vectors||6.2|
|Normalized vector addition then normalizing||5.3|
Finally, concatenating (and truncating if there were more than 10 words per text and padding if there were fewer) the word vectors for a sentenceand feeding it into an ANN produced an accuracy of 94.2%. Naive Bayes and Random Forest gave a similar results (92.3% and 93.3% respectively)
Note: multiplying each vector by a factor that was between 0.5 and 1.5 made no difference to the accuracy of an ANN. Weights will simply change accordingly.
It seems like an ANN can take data in pretty much any format (although it's better when we're using variations on the category space rather than TF-IDF).
When I asked a Data Science PhD I work with which technique I should use (adding vectors; finding centroids, concatenating vectors etc) his answer was: "Yes".