Using the Stanford API for NER

We will demonstrate the CRFClassifier class as it's going to be used to perform NER. This class implements what is known as a linear chain conditional random field (CRF) sequence model.

To demonstrate the use of the CRFClassifier class, we will start with a declaration of the classifier file string, as shown here:

String model = getModelDir() +  
    "\english.conll.4class.distsim.crf.ser.gz"; 

The classifier is then created using the model:

CRFClassifier<CoreLabel> classifier = 
    CRFClassifier.getClassifierNoExceptions(model);

The classify method takes a single string representing the text to be processed. To use the sentences text, we need to convert it to a simple string:

String sentence = ""; 
for (String element : sentences) { 
    sentence += element; 
} 

The classify method is then applied to the text:

List<List<CoreLabel>> entityList = classifier.classify(sentence); 

A List instance of List instances of CoreLabel objects is returned. The object returned is a list that contains another list. The contained list is a List instance of CoreLabel objects. The CoreLabel class represents a word with additional information attached to it. The internal list contains a list of these words. In the outer for-each statement in the following code sequence, the reference variable, internalList, represents one sentence of the text. In the inner for-each statement, each word in that inner list is displayed. The word method returns the word and the get method returns the type of the word.

The words and their types are then displayed:

for (List<CoreLabel> internalList: entityList) { 
    for (CoreLabel coreLabel : internalList) { 
        String word = coreLabel.word(); 
        String category = coreLabel.get( 
            CoreAnnotations.AnswerAnnotation.class); 
        System.out.println(word + ":" + category); 
    } 
} 

Part of the output follows. It has been truncated because every word is displayed. The O represents the other category:

    Joe:PERSON
    was:O
    the:O
    last:O
    person:O
    to:O
    see:O
    Fred:PERSON
    .:O

 He:O ... look:O for:O Fred:PERSON
  

To filter out the words that are not relevant, replace the println statement with the following statements. This will eliminate the other categories:

if (!"O".equals(category)) { 
    System.out.println(word + ":" + category); 
} 

The output is simpler now:

Joe:PERSON
Fred:PERSON
Boston:LOCATION
McKenzie:PERSON
Joe:PERSON
Vermont:LOCATION
IBM:ORGANIZATION
Sally:PERSON
Fred:PERSON  
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset