Determining tag confidence with the HmmDecoder class

Statistical analysis can be performed using a lattice structure, which is useful for analyzing alternative word orderings. This structure represents forward/backward scores. The HmmDecoder class's tagMarginal method returns an instance of the TagLattice class, which represents a lattice.

We can examine each token of the lattice using an instance of the ConditionalClassification class. In the following example, the tagMarginal method returns a TagLattice instance. A loop is used to obtain the ConditionalClassification instance for each token in the lattice.

We are using the same tokenList instance that we developed in the previous section:

TagLattice<String> lattice = decoder.tagMarginal(tokenList); 
for (int index = 0; index < tokenList.size(); index++) { 
    ConditionalClassification classification =  
        lattice.tokenClassification(index); 
    ... 
}

The ConditionalClassification class has a score and a category method. The score method returns a relative score for a given category. The category method returns this category, which is the tag. The token, its score, and its category are displayed as shown here:

System.out.printf("%-8s",tokenList.get(index)); 
for (int i = 0; i < 4; ++i) { 
    double score = classification.score(i); 
    String tag = classification.category(i); 
    System.out.printf("%7.3f/%-3s ",score,tag); 
} 
System.out.println(); 

The output is shown as follows:

    Bill      0.974/np    0.018/nn    0.006/rb    0.001/nps 
    used      0.935/vbd   0.065/vbn   0.000/jj    0.000/rb  
    the       1.000/at    0.000/jj    0.000/pps   0.000/pp$$ 
    force     0.977/nn    0.016/jj    0.006/vb    0.001/rb  
    to        0.944/to    0.055/in    0.000/rb    0.000/nn  
    force     0.945/vb    0.053/nn    0.002/rb    0.001/jj  
    the       1.000/at    0.000/jj    0.000/vb    0.000/nn  
    manager   0.982/nn    0.018/jj    0.000/nn$   0.000/vb  
    to        0.988/to    0.012/in    0.000/rb    0.000/nn  
    tear      0.991/vb    0.007/nn    0.001/rb    0.001/jj  
    the       1.000/at    0.000/jj    0.000/vb    0.000/nn  
    bill      0.994/nn    0.003/jj    0.002/rb    0.001/nns 
    in        0.990/in    0.004/rp    0.002/nn    0.001/jj  
    two.      0.960/nn    0.013/np    0.011/nns   0.008/rb

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset