Using the WhitespaceTokenizer class

As its name implies, this class uses whitespaces as delimiters. In the following code sequence, an instance of the tokenizer is created and the tokenize method is executed against it using paragraph as input. The for statement then displays the tokens:

String tokens[] = 
WhitespaceTokenizer.INSTANCE.tokenize(paragraph); for (String token : tokens) { System.out.println(token); }

The output is as follows:

    Let's
    pause,
    and
    then
    reflect.  

Although this does not separate contractions and similar units of text, it can be useful for some applications. The class also possesses a tokizePos method that returns boundaries of the tokens.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset