OpenNLP supports different libraries, as listed in the following table. These models can be downloaded from http://opennlp.sourceforge.net/models-1.5/.
The en prefix specifies English as the language and ner indicates that the model is for NER:
English finder models |
Filename |
Location name finder model |
en-ner-location.bin |
Money name finder model |
en-ner-money.bin |
Organization name finder model |
en-ner-organization.bin |
Percentage name finder model |
en-ner-percentage.bin |
Person name finder model |
en-ner-person.bin |
Time name finder model |
en-ner-time.bin |
If we modify the statement to use a different model file, we can see how they work against the sample sentences:
InputStream modelStream = new FileInputStream( new File(getModelDir(), "en-ner-time.bin"));) {
The various outputs are shown in the following table:
Model |
Output |
en-ner-location.bin |
Span: [4..5) location Entity: Boston Probability: 0.8656908776583051 Span: [5..6) location Entity: Vermont Probability: 0.9732488014011262 |
en-ner-money.bin |
Span: [14..16) money Entity: 2.45 Probability: 0.7200919701507937 |
en-ner-organization.bin |
Span: [16..17) organization Entity: IBM Probability: 0.9256970736336729 |
en-ner-time.bin |
The model was not able to detect time in this text sequence |
When the en-ner-money.bin model is used, the index in the tokens array in the earlier code sequence has to be increased by 1. Otherwise, all that is returned is the dollar sign.
The model failed to find the time entities in the sample text. This illustrates that the model did not have enough confidence to find any time entities in the text.