Using Word Embeddings to represent a word like the following:
How do they work?
Take all the words you know and initialise a 0-list of that length (with each word tied to an index). Iterate over the sentence you want to encode. Change the value associated with each word to amount of times it appears. EG:
The sentence “I love apples, bananas and dates, but apples the most!” of an encoding would look like .
This isn’t technically accurate, as in reality, as ‘one-hot’ implies, it’s more like: , , …
Surely that’s inefficient?
It may seem inefficient for large dictionaries, but computers are quite efficient when representing long lists of 0’s. Also most of the time, we just use the index of where the number is and work our magic from there.
Why does it suck?
The reason it’s not the greatest, is it’s quite difficult to learn the relationships between different words. This is something Vector Embeddings is better at.