Using Stanford MaxentTagger

The MaxentTagger class uses a model to perform the tagging task. There are a number of models that come bundled with the API, all with the file extension .tagger. They include English, Chinese, Arabic, French, and German models.
The English models are listed here. The prefix, wsj, refers to models based on the Wall Street Journal. The other terms refer to techniques used to train the model. These concepts are not covered here:

  • wsj-0-18-bidirectional-distsim.tagger
  • wsj-0-18-bidirectional-nodistsim.tagger
  • wsj-0-18-caseless-left3words-distsim.tagger
  • wsj-0-18-left3words-distsim.tagger
  • wsj-0-18-left3words-nodistsim.tagger
  • english-bidirectional-distsim.tagger
  • english-caseless-left3words-distsim.tagger
  • english-left3words-distsim.tagger

The example reads in a series of sentences from a file. Each sentence is then processed and various ways of accessing and displaying the words and tags are shown.

We start with a try-with-resources block to deal with IO exceptions, as shown here. The wsj-0-18-bidirectional-distsim.tagger file is used to create an instance of the MaxentTagger class.

A List instance of List instances of HasWord objects is created using the MaxentTagger class's tokenizeText method. The sentences are read in from the sentences.txt file. The HasWord interface represents words and contains two methods: a setWord and a word method. The latter method returns a word as a string. Each sentence is represented by a List instance of HasWord objects:

try { 
    MaxentTagger tagger = new MaxentTagger(getModelDir() +  
        "//wsj-0-18-bidirectional-distsim.tagger"); 
    List<List<HasWord>> sentences = MaxentTagger.tokenizeText( 
        new BufferedReader(new FileReader("sentences.txt"))); 
    ... 
} catch (FileNotFoundException ex) { 
    // Handle exceptions 
}

The sentences.txt file contains the first four sentences of Chapter 5, At A Venture, of the book Twenty Thousand Leagues Under the Sea:

The voyage of the Abraham Lincoln was for a long time marked by no special incident. 
But one circumstance happened which showed the wonderful dexterity of Ned Land, and proved what confidence we might place in him. 
The 30th of June, the frigate spoke some American whalers, from whom we learned that they knew nothing about the narwhal. 
But one of them, the captain of the Monroe, knowing that Ned Land had shipped on board the Abraham Lincoln, begged for his help in chasing a whale they had in sight.

A loop is added to process each sentence of the sentences list. The tagSentence method returns a List instance of TaggedWord objects, as shown in the following code. The TaggedWord class implements the HasWord interface and adds a tag method that returns the tag associated with the word. As shown here, the toString method is used to display each sentence:

List<TaggedWord> taggedSentence = 
tagger.tagSentence(sentence); for (List<HasWord> sentence : sentences) { List<TaggedWord> taggedSentence=
tagger.tagSentence(sentence); System.out.println(taggedSentence); }

The output is as follows:

    [The/DT, voyage/NN, of/IN, the/DT, Abraham/NNP, Lincoln/NNP, was/VBD, for/IN, a/DT, long/JJ, --- time/NN, marked/VBN, by/IN, no/DT, special/JJ, incident/NN, ./.]
     [But/CC, one/CD, circumstance/NN, happened/VBD, which/WDT, showed/VBD, the/DT, wonderful/JJ, dexterity/NN, of/IN, Ned/NNP, Land/NNP, ,/,, and/CC, proved/VBD, what/WP, confidence/NN, we/PRP, might/MD, place/VB, in/IN, him/PRP, ./.]
    [The/DT, 30th/JJ, of/IN, June/NNP, ,/,, the/DT, frigate/NN, spoke/VBD, some/DT, American/JJ, whalers/NNS, ,/,, from/IN, whom/WP, we/PRP, learned/VBD, that/IN, they/PRP, knew/VBD, nothing/NN, about/IN, the/DT, narwhal/NN, ./.]
    [But/CC, one/CD, of/IN, them/PRP, ,/,, the/DT, captain/NN, of/IN, the/DT, Monroe/NNP, ,/,, knowing/VBG, that/IN, Ned/NNP, Land/NNP, had/VBD, shipped/VBN, on/IN, board/NN, the/DT, Abraham/NNP, Lincoln/NNP, ,/,, begged/VBN, for/IN, his/PRP$, help/NN, in/IN, chasing/VBG, a/DT, whale/NN, they/PRP, had/VBD, in/IN, sight/NN, ./.]

Alternatively, we can use the Sentence class's listToString method to convert the tagged sentence to a simple String object.

A value of false for its second parameter is used by the toString method of HasWord to create the resulting string, as shown here:

List<TaggedWord> taggedSentence = 
tagger.tagSentence(sentence); for (List<HasWord> sentence : sentences) { List<TaggedWord> taggedSentence=
tagger.tagSentence(sentence); System.out.println(Sentence.listToString(taggedSentence, false)); }

This produces a more aesthetically pleasing output:

    The/DT voyage/NN of/IN the/DT Abraham/NNP Lincoln/NNP was/VBD for/IN a/DT long/JJ time/NN marked/VBN by/IN no/DT special/JJ incident/NN ./.
    But/CC one/CD circumstance/NN happened/VBD which/WDT showed/VBD the/DT wonderful/JJ dexterity/NN of/IN Ned/NNP Land/NNP ,/, and/CC proved/VBD what/WP confidence/NN we/PRP might/MD place/VB in/IN him/PRP ./.
    The/DT 30th/JJ of/IN June/NNP ,/, the/DT frigate/NN spoke/VBD some/DT American/JJ whalers/NNS ,/, from/IN whom/WP we/PRP learned/VBD that/IN they/PRP knew/VBD nothing/NN about/IN the/DT narwhal/NN ./.
    But/CC one/CD of/IN them/PRP ,/, the/DT captain/NN of/IN the/DT Monroe/NNP ,/, knowing/VBG that/IN Ned/NNP Land/NNP had/VBD shipped/VBN on/IN board/NN the/DT Abraham/NNP Lincoln/NNP ,/, begged/VBN for/IN his/PRP$ help/NN in/IN chasing/VBG a/DT whale/NN they/PRP had/VBD in/IN sight/NN ./.

We can use the following code sequence to produce the same results. The word and tag methods extract the words and their tags:

List<TaggedWord> taggedSentence = 
tagger.tagSentence(sentence); for (TaggedWord taggedWord : taggedSentence) { System.out.print(taggedWord.word() + "/" +
taggedWord.tag() + " "); } System.out.println();

If we are only interested in finding specific occurrences of a given tag, we can use a sequence such as the following, which will list only the singular nouns (NN):

List<TaggedWord> taggedSentence = 
tagger.tagSentence(sentence); for (TaggedWord taggedWord : taggedSentence) { if (taggedWord.tag().startsWith("NN")) { System.out.print(taggedWord.word() + " "); } } System.out.println();

The singular nouns are displayed for each sentence, as shown here:

    NN Tagged: voyage Abraham Lincoln time incident 
    NN Tagged: circumstance dexterity Ned Land confidence 
    NN Tagged: June frigate whalers nothing narwhal 
    NN Tagged: captain Monroe Ned Land board Abraham Lincoln help whale sight

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset