GloVe

Global Vectors for Word representation (GloVe) is a model for word representation. It falls under the category of unsupervised learning. It learns from developing a count matrix for word occurrence. Initially, it starts with the large matrix to store almost all the words and their co-occurrence information, which stores the count of how frequently some words appear in the sequence in given text. Support for GloVe is available in Stanford NLP, but is not implemented in Java. To read more about GloVe, visit https://nlp.stanford.edu/pubs/glove.pdf. A brief introduction and some resources for the Stanford GloVe can be found at https://nlp.stanford.edu/projects/glove/. To get an idea of what GloVe does, we will be using a Java implementation of GloVe found at https://github.com/erwtokritos/JGloVe .

The code also includes the test file and a text file. The text file's contents are as follows:

human interface computer
survey user computer system response time
eps user interface system
system human system eps
user response time
trees
graph trees
graph minors trees
graph minors survey
I like graph and stuff
I like trees and stuff
Sometimes I build a graph
Sometimes I build trees

GloVe presents similar words from the previous text. The results for finding words similar to graph from the previous text is as follows:

INFO: Building vocabulary complete.. There are 19 terms
Iteration #1 , cost = 0.4109707480627031
Iteration #2 , cost = 0.37748817335537205
Iteration #3 , cost = 0.3563396433036622
Iteration #4 , cost = 0.3483667149265019
Iteration #5 , cost = 0.3434632969758875
Iteration #6 , cost = 0.33917154339742045
Iteration #7 , cost = 0.3304641363014488
Iteration #8 , cost = 0.32717383183159243
Iteration #9 , cost = 0.3240225514512226
Iteration #10 , cost = 0.32196412138868596
@trees
@minors
@computer
@a
@like
@survey
@eps
@interface
@and
@human
@user
@time
@response
@system
@Sometimes

So, the first matching word is "tree," followed by "minors," and so on. The code it uses to test is as follows:

        String file = "test.txt";

Options options = new Options();
options.debug = true;

Vocabulary vocab = GloVe.build_vocabulary(file, options);

options.window_size = 3;
List<Cooccurrence> c = GloVe.build_cooccurrence(vocab, file, options);

options.iterations = 10;
options.vector_size = 10;
options.debug = true;
DoubleMatrix W = GloVe.train(vocab, c, options);

List<String> similars = Methods.most_similar(W, vocab, "graph", 15);
for(String similar : similars) {
System.out.println("@" + similar);
}
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset