Hidden Markov Models (HMM) are conducive to solving classification problems with generative sequences. In natural language processing, HMM can be used for a variety of tasks such as phrase chunking, parts of speech tagging, and information extraction from documents. If we consider words as input, while any prior information on the input can be considered as states, and estimated conditional probabilities can be considered as the output, then POS tagging can be categorized as a typical sequence classification problem that can be solved using HMM.
According to (Rabiner), there are five elements needed to define an HMM:
Where qt denotes the current state, the transition probabilities should also satisfy the normal stochastic constraints:
And:
Where denotes the kth observation symbol and the current parameter vector, the following conditions must be satisfied:
And:
When implementing an HMM, floating-point underflow is a significant problem. When we apply the Viterbi or forward algorithms to long sequences, the resultant probability values are very small, which can underflow on most machines. We solve this problem differently for each algorithm, as shown in the following examples.
As the Viterbi algorithm only multiplies probabilities, a simple solution to underflow is to log all the probability values and then add values instead of multiplying. In fact, if all the values in the model matrices (A, B, π) are logged, then at runtime only addition operations are needed.
The forward algorithm sums probability values, so it is not a viable solution to log the values in order to avoid underflow. The most common solution to this problem is to use scaling coefficients that keep the probability values in the dynamic range of the machine, and that are dependent only on t. The coefficient is defined as:
And thus the new-scaled value for α becomes:
The process of chunking consists of dividing a text into syntactically correlated parts of words, like noun groups, and verb groups, but does not specify their internal structure, or their role in the main sentence. We can use Maxent_Chunk_Annotator()
for the OpenNLP
R package to accomplish this.
Before we can use this method, we have to POS
tag the sentence. We can use the same steps performed previously for POS
tagging:
chunkAnnotator <- Maxent_Chunk_Annotator(language = "en", probs = FALSE, model = NULL) annotate(s,chunkAnnotator,posTaggedSentence) head(annotate(s,chunkAnnotator,posTaggedSentence))
The chunk
tag provides some more useful information:
The chunk tags contain the name of the chunk type, for example, I-NP for noun phrase words and I-VP for verb phrase words. Most chunk types have two types of chunk tags: B-CHUNK for the first word of the chunk and I-CHUNK for each other word in the chunk. A chunk tag like B-NP is made up of two parts:
A chunk may be only one word long or may contain multiple words (like Pierre Vinken in the preceding example).The O chunk tag is used for tokens which are not part of any chunk.