Using the Stanford pipeline

In this section, we will discuss the Stanford pipeline in more detail. Although we have used it in several examples in this book, we have not fully explored its capabilities. Having used this pipeline before, you are now in a better position to understand how it can be used. Upon reading this section, you will be able to better assess its capabilities and applicability to your needs. The edu.stanford.nlp.pipeline package holds the StanfordCoreNLP and annotator classes. The general approach uses the following code sequence where the text string is processed. The Properties class holds the annotation names, and the Annotation class represents the text to be processed. The StanfordCoreNLP class's Annotate method will apply annotation specified in the properties list. The CoreMap interface is a basic interface of all annotable objects. It uses key and value pairs. A hierarchy of the classes and interfaces is shown in the following diagram:

It is a simplified version of the relationship between classes and interfaces. The CoreLabel class implements the CoreMap interface. It represents a single word with annotation information attached to it. The information attached depends on the properties set when the pipeline is created. However, there will always be positional information available, such as its beginning and ending positions or the whitespace before and after the entity. The get method for either CoreMap or CoreLabel returns information specific to its argument. The get method is overloaded and returns a value that's dependent on the type of its argument. The CoreLabel class has been used to access individual words in a sentence.

We will use the keyset method that returns a set of all of the annotation keys currently held by the Annotation object. The keys are displayed before and after the annotate method is applied. The full working code is shown here:

String text = "The robber took the cash and ran";
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
Annotation annotation = new Annotation(text);

System.out.println("Before annotate method executed ");
Set<Class<?>> annotationSet = annotation.keySet();
for(Class c : annotationSet) {
System.out.println(" Class: " + c.getName());
}

pipeline.annotate(annotation);

System.out.println("After annotate method executed ");
annotationSet = annotation.keySet();
for(Class c : annotationSet) {
System.out.println(" Class: " + c.getName());
}
List<CoreMap> sentences = annotation.get(SentencesAnnotation.class);
for (CoreMap sentence : sentences) {
for (CoreLabel token: sentence.get(TokensAnnotation.class)) {
String word = token.get(TextAnnotation.class);
String pos = token.get(PartOfSpeechAnnotation.class);
System.out.println(word);
System.out.println(pos);
}
}

The following output shows the before and after call as well as words and POS:

Before annotate method executed 
Class: edu.stanford.nlp.ling.CoreAnnotations$TextAnnotation
After annotate method executed
Class: edu.stanford.nlp.ling.CoreAnnotations$TextAnnotation
Class: edu.stanford.nlp.ling.CoreAnnotations$TokensAnnotation
Class: edu.stanford.nlp.ling.CoreAnnotations$SentencesAnnotation
Class: edu.stanford.nlp.ling.CoreAnnotations$MentionsAnnotation
Class: edu.stanford.nlp.coref.CorefCoreAnnotations$CorefMentionsAnnotation
Class: edu.stanford.nlp.ling.CoreAnnotations$CorefMentionToEntityMentionMappingAnnotation
Class: edu.stanford.nlp.ling.CoreAnnotations$EntityMentionToCorefMentionMappingAnnotation
Class: edu.stanford.nlp.coref.CorefCoreAnnotations$CorefChainAnnotation
The
DT
robber
NN
took
VBD
the
DT
cash
NN
and
CC
ran
VBD
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset