GloVe

About a year after word2vec, researchers at Stanford published a paper (http://web.stanford.edu/~jpennin/papers/glove.pdf) that explicitly identifies the objective that word2vec optimizes under the hood. Their method, GloVe, (global vectors) explicitly names the objective matrix, identifies the factorization, and provides some intuitive justification as to why this should give us working similarities.

This section describes, at a high level, the inner workings of the GloVe algorithm. As such, it is a bit more math-heavy than we would have wanted. If you are not familiar with linear algebra and calculus, you can safely skip it. 

How does GloVe work?

  1. Create a word co-occurrence matrix where each entry represents how often word i appears in the context of word j. This matrix is clearly symmetric. Moreover, for each term, we look for words co-occurring on certain windows, and give less weight to more distant words. 
  2. Define soft constraints for each word pair. 
  3. Finally, introduce a cost function that penalizes learning from very common word pairs.
    For the soft constraints, we mean:

Where the wi and wj denote the main and context embedded vectors, respectively, with biases bi and bj, and Xij is the co-occurrence of word j in the context of word i.

The cost function is defined by:

Where V is the vocabulary and f is defined by:

>

Where alpha and Xmax are hyperparameters that you can choose.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset