word2vec

The word2vec algorithm (or, rather, family of algorithms) takes a text corpus as input and produces the word vectors as output. It first constructs a vocabulary from the training text data and then learns vector representation of words. Then we use those vectors as features for machine learning algorithms.

Word vectors are able to catch some intuitive regularities in the language, for instance:

vector('Paris') - vector('France') + vector('Italy')

Results in a vector that is very close to:

vector('Rome'),

And,

vector('king') - vector('man') + vector('woman')

Is close to:

vector('queen').

What does word2vec do behind the scenes? word2vec arrives at word vectors by training a neural network to predict:

A word in the center from its surroundings (continuous bag of words, CBOW)
A word's surroundings from the center word (skip-gram model)

Why is that useful? According to the distributional hypothesis, words occurring together tend to convey similar meanings. Researchers Goldberg and Levy point out (https://arxiv.org/abs/1402.3722) that the word2vec objective function causes words that occur in similar contexts to have similar embeddings, which is in line with the distributional hypothesis. However, they also point out that a better explanation is required.

The distributional hypothesis states that words that are used in the same contexts are related. This is an underlying assumption of vector embedding algorithms such as GloVe or word2vec.

Table of Contents for word2vec

Create new playlist

Sign In

Sign Up

Table of Contents for
word2vec