Creating a dictionary from a file

If we need to create a new dictionary, then one approach is to create an XML file containing all of the words and their tags, and then create the dictionary from the file. OpenNLP supports this approach with the POSDictionary class's create method.

The XML file consists of the dictionary root element, followed by a series of entry elements. The entry element uses the tags attribute to specify the tags for the word. The word is contained within the entry element as a token element. A simple example using two words stored in the dictionary.txt file is as follows:

<dictionary case_sensitive="false"> 
    <entry tags="JJ VB"> 
        <token>strong</token> 
    </entry> 
    <entry tags="NN VBP VB"> 
        <token>force</token> 
    </entry> 
</dictionary>

To create the dictionary, we use the create method based on an input stream, as shown here:

try (InputStream dictionaryIn =  
      new FileInputStream(new File("dictionary.txt"));) { 
    POSDictionary dictionary = 
POSDictionary.create(dictionaryIn); ... } catch (IOException e) { // Handle exceptions }

The POSDictionary class has an iterator method that returns an iterator object. Its next method returns a string for each word in the dictionary. We can use these methods to display the contents of the dictionary, as shown here:

Iterator<String> iterator = dictionary.iterator(); 
while (iterator.hasNext()) { 
    String entry = iterator.next(); 
    String tags[] = dictionary.getTags(entry); 
    System.out.print(entry + " "); 
    for (String tag : tags) { 
        System.out.print("/" + tag); 
    } 
    System.out.println(); 
}

The output that follows displays what we can expect:

  strong /JJ/VB
  force /NN/VBP/VB
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset