Using LingPipe's named entity models

LingPipe has a few named entity models that we can use with chunking. These files consist of a serialized object that can be read from a file and then applied to text. These objects implement the Chunker interface. The chunking process results in a series of Chunking objects that identify the entities of interest.

A list of NER models is found in the following table. These models can be downloaded from http://alias-i.com/lingpipe/web/models.html:

Genre

Corpus

File

English news

MUC-6

ne-en-news-muc6.AbstractCharLmRescoringChunker

English genes

GeneTag

ne-en-bio-genetag.HmmChunker

English genomics

GENIA

ne-en-bio-genia.TokenShapeChunker

We will use the model found in the ne-en-news-muc6.AbstractCharLmRescoringChunker file to demonstrate how this class is used.
We will start with a try...catch block to deal with exceptions, as shown in the following example. The file is opened and used with the AbstractExternalizable class's static readObject method to create an instance of a Chunker class. This method
will read in the serialized model:

try { 
    File modelFile = new File(getModelDir(),  
        "ne-en-news-muc6.AbstractCharLmRescoringChunker"); 
     Chunker chunker = (Chunker)  
        AbstractExternalizable.readObject(modelFile); 
    ... 
} catch (IOException | ClassNotFoundException ex) { 
    // Handle exception 
} 

The Chunker and Chunking interfaces provide methods that work with a set of chunks of text. Its chunk method returns an object that implements the Chunking instance. The following sequence displays the chunks found in each sentence of the text, as shown here:

for (int i = 0; i < sentences.length; ++i) { 
    Chunking chunking = chunker.chunk(sentences[i]); 
    System.out.println("Chunking=" + chunking); 
} 

The output of this sequence is as follows:

    Chunking=Joe was the last person to see Fred.  : [0-3:PERSON@-Infinity, 31-35:ORGANIZATION@-Infinity]
    Chunking=He saw him in Boston at McKenzie's pub at 3:00 where he paid $2.45 for an ale.  : [14-20:LOCATION@-Infinity, 24-32:PERSON@-Infinity]
    Chunking=Joe wanted to go to Vermont for the day to visit a cousin who works at IBM, but Sally and he had to look for Fred : [0-3:PERSON@-Infinity, 20-27:ORGANIZATION@-Infinity, 71-74:ORGANIZATION@-Infinity, 109-113:ORGANIZATION@-Infinity]

Instead, we can use methods of the Chunk class to extract specific pieces of information, as illustrated in the following code. We will replace the previous for statement with the following foreach statement. This calls the displayChunkSet method that was developed in the Using the RegExChunker class of LingPipe section earlier in this chapter:

for (String sentence : sentences) { 
    displayChunkSet(chunker, sentence); 
} 

The output that follows shows the result. However, it does not always match the entity type correctly:

Type: PERSON Entity: [Joe] Score: -Infinity
Type: ORGANIZATION Entity: [Fred] Score: -Infinity
Type: LOCATION Entity: [Boston] Score: -Infinity
Type: PERSON Entity: [McKenzie] Score: -Infinity
Type: PERSON Entity: [Joe] Score: -Infinity
Type: ORGANIZATION Entity: [Vermont] Score: -Infinity
Type: ORGANIZATION Entity: [IBM] Score: -Infinity
Type: ORGANIZATION Entity: [Fred] Score: -Infinity  
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset