Using OpenNLP

Parsing text is simple using the ParserTool class. Its static parseLine method accepts three arguments and returns a Parser instance. These arguments are as follows:

  • A string containing the text to be parsed
  • A Parser instance
  • An integer specifying how many parses are to be returned

The Parser instance holds the elements of the parse. The parses are returned in order of their probability. To create a Parser instance, we will use the ParserFactory class' create method. This method uses a ParserModel instance that we will create using the en-parser-chunking.bin file.

This process is shown here, in which an input stream for the model file is created using a try-with-resources block. The ParserModel instance is created, followed by a Parser instance:

String fileLocation = getModelDir() +  
    "/en-parser-chunking.bin"; 
try (InputStream modelInputStream =  
            new FileInputStream(fileLocation);) { 
     ParserModel model = new ParserModel(modelInputStream); 
    Parser parser = ParserFactory.create(model); 
    ... 
} catch (IOException ex) { 
    // Handle exceptions 
} 

We will use a simple sentence to demonstrate the parsing process. In the following code sequence, the parseLine method is invoked using a value of 3 for the third argument. This will return the top three parses:

String sentence = "The cow jumped over the moon"; 
Parse parses[] = ParserTool.parseLine(sentence, parser, 3); 

Next, these parses are displayed along with their probabilities, as shown here:

for(Parse parse : parses) { 
    parse.show(); 
    System.out.println("Probability: " + parse.getProb()); 
} 

The output is as follows:

    (TOP (S (NP (DT The) (NN cow)) (VP (VBD jumped) (PP (IN over) (NP (DT the) (NN moon))))))
    Probability: -1.043506016751117
    (TOP (S (NP (DT The) (NN cow)) (VP (VP (VBD jumped) (PRT (RP over))) (NP (DT the) (NN moon)))))
    Probability: -4.248553665013661
    (TOP (S (NP (DT The) (NNS cow)) (VP (VBD jumped) (PP (IN over) (NP (DT the) (NN moon))))))
    Probability: -4.761071294573854
  

Notice that each parse produces a slightly different order and assignment of tags. The following output shows the first parse formatted to make it easier to read:

    (TOP 
          (S 
              (NP 
                   (DT The) 
                   (NN cow)
              )
              (VP 
                   (VBD jumped) 
                   (PP 
                        (IN over)
                        (NP 
                             (DT the)
                             (NN moon)
                         )
                   )
               )
         )
    )
  

The showCodeTree method can be used instead to display parent-child relationships:

parse.showCodeTree(); 

The output for the first parse is shown here. The first part of each line shows the element levels enclosed in brackets. The tag is displayed next, followed by two hash values separated by ->. The first number is for the element and the second number is for its parent. For example, in the third line, it shows the proper noun, The, to have a parent of the noun phrase, The cow:

[0] S -929208263 -> -929208263 TOP The cow jumped over the moon
[0.0] NP -929237012 -> -929208263 S The cow
[0.0.0] DT -929242488 -> -929237012 NP The
[0.0.0.0] TK -929242488 -> -929242488 DT The
[0.0.1] NN -929034400 -> -929237012 NP cow
[0.0.1.0] TK -929034400 -> -929034400 NN cow
[0.1] VP -928803039 -> -929208263 S jumped over the moon
[0.1.0] VBD -928822205 -> -928803039 VP jumped
[0.1.0.0] TK -928822205 -> -928822205 VBD jumped
[0.1.1] PP -928448468 -> -928803039 VP over the moon
[0.1.1.0] IN -928460789 -> -928448468 PP over
[0.1.1.0.0] TK -928460789 -> -928460789 IN over
[0.1.1.1] NP -928195203 -> -928448468 PP the moon
[0.1.1.1.0] DT -928202048 -> -928195203 NP the
[0.1.1.1.0.0] TK -928202048 -> -928202048 DT the
[0.1.1.1.1] NN -927992591 -> -928195203 NP moon
[0.1.1.1.1.0] TK -927992591 -> -927992591 NN moon  

Another way of accessing the elements of the parse is through the getChildren method. This method returns an array of the Parse objects, each representing an element of the parse. Using various Parse methods, we can get each element's text, tag, and labels. This is illustrated here:

Parse children[] = parse.getChildren(); 
for (Parse parseElement : children) { 
    System.out.println(parseElement.getText()); 
    System.out.println(parseElement.getType()); 
    Parse tags[] = parseElement.getTagNodes(); 
    System.out.println("Tags"); 
    for (Parse tag : tags) { 
        System.out.println("[" + tag + "]"  
            + " type: " + tag.getType()  
            + "  Probability: " + tag.getProb()  
            + "  Label: " + tag.getLabel()); 
    } 
} 

The output of this sequence is as follows:

The cow jumped over the moon
S
Tags
[The] type: DT  Probability: 0.9380626549164167  Label: null
[cow] type: NN  Probability: 0.9574993337971017  Label: null
[jumped] type: VBD  Probability: 0.9652983971550483  Label: S-VP
[over] type: IN  Probability: 0.7990638213315913  Label: S-PP
[the] type: DT  Probability: 0.9848023215770413  Label: null
[moon] type: NN  Probability: 0.9942338356992393  Label: null  
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset