Using the RegExChunker class of LingPipe

The RegExChunker class uses chunks to find entities in text. The class uses a regular expression to represent an entity. Its chunk method returns a Chunking object that can be used  just as we used it in our earlier examples.

The RegExChunker class's constructor takes three arguments:

  • String: This is a regular expression
  • String: This is a type of entity or category
  • double: A value for the score

We will demonstrate this class using a regular expression representing time in the following example. The regular expression is the same as that used in the Using Java's regular expressions to find entities section earlier in this chapter. The Chunker instance is then created:

String timeRE =  
   "(([0-1]?[0-9])|([2][0-3])):([0-5]?[0-9])(:([0-5]?[0-9]))?"; 
       Chunker chunker = new RegExChunker(timeRE,"time",1.0); 

The Chunk method is used, along with the displayChunkSet method, as shown here:

Chunking chunking = chunker.chunk(regularExpressionText); 
Set<Chunk> chunkSet = chunking.chunkSet(); 
displayChunkSet(chunker, regularExpressionText); 

The displayChunkSet method is shown in the following code segment. The chunkSet method returns a set collection of Chunk instances. We can use various methods to display specific parts of the chunk:

public void displayChunkSet(Chunker chunker, String text) { 
    Chunking chunking = chunker.chunk(text); 
    Set<Chunk> set = chunking.chunkSet(); 
    for (Chunk chunk : set) { 
        System.out.println("Type: " + chunk.type() + " Entity: [" 
             + text.substring(chunk.start(), chunk.end()) 
             + "] Score: " + chunk.score()); 
    } 
} 

The output is as follows:

    Type: time Entity: [8:00] Score: 1.0
    Type: time Entity: [4:30] Score: 1.0+95

Alternatively, we can declare a simple class to encapsulate the regular expression, which lends itself to reuse in other situations. Next, the TimeRegexChunker class is declared and it supports the identification of time entities:

public class TimeRegexChunker extends RegExChunker { 
    private final static String TIME_RE =  
      "(([0-1]?[0-9])|([2][0-3])):([0-5]?[0-9])(:([0-5]?[0-9]))?"; 
    private final static String CHUNK_TYPE = "time"; 
    private final static double CHUNK_SCORE = 1.0; 
     
    public TimeRegexChunker() { 
        super(TIME_RE,CHUNK_TYPE,CHUNK_SCORE); 
    } 
} 

To use this class, replace this section's initial declaration of chunker with the following declaration:

Chunker chunker = new TimeRegexChunker(); 

The output will be the same as before.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset