GloVe

About a year after word2vec, researchers at Stanford published a paper (http://web.stanford.edu/~jpennin/papers/glove.pdf) that explicitly identifies the objective that word2vec optimizes under the hood. Their method, GloVe, (global vectors) explicitly names the objective matrix, identifies the factorization, and provides some intuitive justification as to why this should give us working similarities.

This section describes, at a high level, the inner workings of the GloVe algorithm. As such, it is a bit more math-heavy than we would have wanted. If you are not familiar with linear algebra and calculus, you can safely skip it.

How does GloVe work?

Create a word co-occurrence matrix where each entry represents how often word i appears in the context of word j. This matrix is clearly symmetric. Moreover, for each term, we look for words co-occurring on certain windows, and give less weight to more distant words.
Define soft constraints for each word pair.
Finally, introduce a cost function that penalizes learning from very common word pairs.
For the soft constraints, we mean:

Where the w_i and w_j denote the main and context embedded vectors, respectively, with biases b_i and b_j, and X_ij is the co-occurrence of word j in the context of word i.

The cost function is defined by:

Where V is the vocabulary and f is defined by:

Where alpha and X_max are hyperparameters that you can choose.

Table of Contents for GloVe

Create new playlist

Sign In

Sign Up

Table of Contents for
GloVe