LingPipe consists of a set of tools to perform common NLP tasks. It supports model training and testing. There are both royalty-free and licensed versions of the tool. The production use of the free version is limited.
To demonstrate the use of LingPipe, we will illustrate how it can be used to tokenize text using the Tokenizer class. Start by declaring two lists, one to hold the tokens and a second to hold the whitespace:
List<String> tokenList = new ArrayList<>(); List<String> whiteList = new ArrayList<>();
Next, declare a string to hold the text to be tokenized:
String text = "A sample sentence processed by the " + "LingPipe tokenizer.";
Now, create an instance of the Tokenizer class. As shown in the following code block, a static tokenizer method is used to create an instance of the Tokenizer class based on an Indo-European factory class:
Tokenizer tokenizer = IndoEuropeanTokenizerFactory.INSTANCE. tokenizer(text.toCharArray(), 0, text.length());
The tokenize method of this class is then used to populate the two lists:
tokenizer.tokenize(tokenList, whiteList);
Use a for-each statement to display the tokens:
for(String element : tokenList) { System.out.print(element + " "); } System.out.println();
The output of this example is shown here:
A sample sentence processed by the LingPipe tokenizer
A list of LingPipe links can be found in the following table:
LingPipe |
Website |
Home |
|
Tutorials |
|
JavaDocs |
|
Download |
|
Core |
|
Models |