Using the SentenceDetectorME class

A model is loaded from a file using the SentenceModel class. An instance of the SentenceDetectorME class is then created using the model, and the sentDetect method is invoked to perform SBD. The method returns an array of strings, with each element holding a sentence.

This process is demonstrated in the following example. A try-with-resources block is used to open the en-sent.bin file, which contains a model. Then, the paragraph string is processed. Next, various IO type-exceptions are caught (if necessary). Finally, a for-each statement is used to display the sentences:

try (InputStream is = new FileInputStream( 
        new File(getModelDir(), "en-sent.bin"))) { 
    SentenceModel model = new SentenceModel(is); 
    SentenceDetectorME detector = new SentenceDetectorME(model); 
    String sentences[] = detector.sentDetect(paragraph); 
    for (String sentence : sentences) { 
        System.out.println(sentence); 
    } 
} catch (FileNotFoundException ex) { 
    // Handle exception 
} catch (IOException ex) { 
    // Handle exception 
}

On execution, we get the following output:

    When determining the end of sentences we need to consider several factors.
    Sentences may end with exclamation marks!
    Or possibly questions marks?
    Within sentences we may find numbers like 3.14159, abbreviations such as found in Mr. Smith, and possibly ellipses either within a sentence ..., or at the end of a sentence...
  

The output worked well for this paragraph. It caught both simple sentences and the more complex sentences. Of course, text that is processed is not always perfect. The following paragraph has extra spaces in some spots and is missing spaces where it needs them. This problem is likely to occur in the analysis of chat sessions:

paragraph = " This sentence starts with spaces and ends with "  
    + "spaces . This sentence has no spaces between the next " 
    + "one.This is the next one."; 

When we use this paragraph with the previous example, we get the
following output:

    This sentence starts with spaces and ends with spaces  .
    This sentence has no spaces between the next one.This is the next one.
  

The leading spaces of the first sentence were removed, but the ending spaces were not. The third sentence was not detected and was merged with the second sentence.

The getSentenceProbabilities method returns an array of doubles representing the confidence of the sentences detected from the last use of the sentDetect method. Add the following code after the for-each statement that displayed the sentences:

double probablities[] = detector.getSentenceProbabilities(); 
for (double probablity : probablities) { 
    System.out.println(probablity); 
} 

By executing with the original paragraph, we get the following output:

    0.9841708738988814
    0.908052385070974
    0.9130082376342675
    1.0

The numbers shown are the probability representing the confidence.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset