269 Hidden Markov Models for Discrete-Valued Time Series
but if necessary this assumption can be relaxed. It is notationally convenient to assemble
the m state–dependent probabilities p
i
(x) into the m × m diagonal matrix
P(x) = diag(p
1
(x), p
2
(x), ..., p
m
(x)).
From this denition, it will be seen that HMMs are “parameter-driven” models, in the
sense used by Cox (1981). They are also state-space models whose latent process is
discrete-valued.
HMMs have a history going back at least to the work of Leonard Baum and co-authors
(see, e.g., Baum et al., 1970; Welch, 2003), although special cases were certainly considered
earlier than that. Such models are well known for their applications in speech recog-
nition (Juang and Rabiner, 1991; Rabiner, 1989) and bioinformatics (Durbin et al., 1998,
Chapter 3), but we focus here on their use as general-purpose models for discrete-valued
time series. HMMs go under a variety of other names or descriptions as well: latent Markov
models, hidden Markov processes, Markov-dependent mixtures, models subject to Markov
regime, and Markov-switching models, the last being more general in that the conditional
independence assumption (12.2) is relaxed.
12.2 Examples of Hidden Markov Time Series Models
We describe here a selection of the ways in which HMMs can be used as models for discrete-
valued time series. Some of these models are univariate, some are multivariate, but all
have the structure specied by (12.1) and (12.2). Furthermore, the process of estimating
parameters by numerical maximization of likelihood will be essentially the same for all;
see Section 12.4.1.
12.2.1 Univariate Models for Discrete Observations
The Poisson–HMM is the simple model in which, conditional on a latent Markov chain {C
t
}
on {1, 2, ..., m}, the observation X
t
has a Poisson distribution with mean λ
i
when C
t
= i.
If the Markov chain is assumed stationary, that is, if δ
∗
= δ, the process {X
t
} is (strictly)
stationary and there are m
2
parameters to be estimated: the m state–dependent means λ
i
and all but one of the transition probabilities in each row of �, for example, the m
2
− m
off-diagonal entries. If the Markov chain is not assumed to be stationary, then δ
∗
must
also be estimated (see, e.g., Leroux and Puterman, 1992). The number of parameters of an
HMM, being of the order m
2
, limits the number of states that it is feasible to use in practice.
Alternatively, one can reduce the number of parameters by structuring the t.p.m. in some
parsimonious way; see Section 12.8 for references.
If we assume instead that the distribution in state i is binomial with parameters n
t
(the
number of trials at time t)and π
i
(the “success probability”), we have a model for a time
series of bounded counts. The special case n
t
= 1 yields a model for binary time series. An
alternative to the Poisson in the case of unbounded counts that exhibit overdispersion is the
negative binomial. (By “negative binomial” we mean here the version of that distribution
that has the nonnegative integers as support.) Indeed, it is not essential to use distributions
from the same family in all the states of an HMM. We could use a Poisson distribution in