The time complexity for decoding and evaluating canonical forms of the hidden Markov model for N states and T observations is O(N2T). The training of the HMM using the Baum-Welch algorithm is O(N2TM), where M is the number of iterations.
There are several options to improve the performance of the HMM:
The training of the linear chain conditional random fields is implemented using the same dynamic programming techniques as the HMM implementation (Viterbi, forward-backward passes, and so on). Its time complexity for training T data sequences, N labels (or expected outcomes), and M weights/features λ is O(MTN2).
The time complexity of the training of a CRF can be reduced by distributing the computation of the log likelihood and gradient over multiple nodes using a framework such as Akka or Apache Spark, as described in Chapter 12, Scalable Frameworks [7:13].