Using the OpenNLPTokenizer class

OpenNLP possesses a Tokenizer interface that is implemented by three classes: SimpleTokenizer, TokenizerME, and WhitespaceTokenizer. This interface supports two methods:

  • tokenize: This is passed a string to tokenize and returns an array of
    tokens as strings.
  • tokenizePos: This is passed a string and returns an array of Span
    objects. The Span class is used to specify the beginning and ending
    offsets of the tokens.

Each of these classes is demonstrated in the following sections.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset