12
Hidden Markov Models for Discrete-Valued
Time Series
Iain L. MacDonald and Walter Zucchini
CONTENTS
12.1 Structure and Components of a Hidden Markov Model. . . ............................267
12.2 ExamplesofHidden MarkovTime SeriesModels.......................................269
12.2.1 Univariate ModelsforDiscrete Observations....................................269
12.2.2 Multivariate Models: Example of a Bivariate Model for
Unbounded Counts..................................................................270
12.2.3 MultivariateModels: Series of Multinomial-Like Observations...............270
12.2.4 Models with Covariates. ... .. ... ... .. ... ... .. . .. ... .. . .. ... .. . .. ... .. . .. ... .. . .. ... .271
12.3 Properties of HMMs..........................................................................271
12.3.1 Computing theLikelihood..........................................................271
12.3.2 Marginal and Joint Distributions...................................................272
12.3.3 Moments...............................................................................273
12.4 Parameter Estimationby MaximumLikelihood.........................................274
12.4.1 Maximum Likelihood via Direct Numerical Maximization....................274
12.4.2 Maximum Likelihood viatheEM Algorithm....................................275
12.4.3 Remarks................................................................................276
12.5 Forecasting and Decoding...................................................................277
12.5.1 Forecast Distributions................................................................277
12.5.2 State Probabilities, State Prediction, and Decoding. . . ... . . ... . . ... . . .. . . . .. . . ..277
12.6 ModelSelection and Checking..............................................................278
12.6.1 AIC and BIC...........................................................................278
12.6.2 Model Checking by Pseudo-Residuals............................................279
12.7 An Example: WeeklySalesofa SoapProduct............................................279
12.8 Extensions and Concluding Remarks......................................................284
References............................................................................................285
12.1 Structure and Components of a Hidden Markov Model
In the search for useful models for discrete-valued time series, one possible approach is to
take a standard model for continuous-valued series, for example, a Gaussian autoregres-
sive moving-average process, and to modify or adapt it in order to allow for the discrete
nature of the data. Another approach, the one followed here, is to start from a model for dis-
crete data which assumes independence and then to relax the independence assumption by
267
268 Handbook of Discrete-Valued Time Series
allowing the distribution of the observations to switch among several possibilities accord-
ing to a latent Markov chain. To take a simple example, a sequence of independent Poisson
random variables with common mean would be inappropriate for a series of unbounded
counts displaying signicant autocorrelation. But a model that allowed the observations
to be Poisson with mean either λ
1
or λ
2
, the choice being made by a discrete-time Markov
chain, might well be adequate; a model of this kind, it will be seen, allows for both serial
dependence and overdispersion relative to the Poisson.
Such a latent Markov chain is a simple device by which any model X
t
(t = 1, 2, ..., T)for
a series of independent observations may be generalized to yield a model that allows for
serial dependence. A hidden Markov model (HMM) relaxes the assumption of indepen-
dence; however, we assume that the observations are conditionally independent,giventhe
states occupied by an unobserved nite state-space Markov chain. Here, we conne our-
selves to discrete data, but the observations could more generally be discrete or continuous,
a mixture of those two, univariate, multivariate, circular in nature, spatial images—any
kind of observation X
t
, in fact, for which it is possible to propose a probabilistic model.
The case of discrete data also includes several possibilities: univariate unbounded counts
(i.e., Poisson-like observations), univariate bounded counts (i.e., binomial-like observa-
tions) including binary observations, observations of categories, and multivariate versions
of these.
HMMs consist of two parts: rst, an unobserved “parameter process” {C
t
: t N } that
is a Markov chain on {1, 2, ..., m}, and second, an observed process {X
t
: t N } such
that the distribution of X
t
is determined only by the current state C
t
, irrespective of all
previous states and observations. (The symbol N denotes the natural numbers.) This struc-
ture is represented by the directed graph in Figure 12.1 and summarized by the following
equations, in which C
(t)
and X
(t)
denote the “histories” of the processes {C
t
} and {X
t
} from
time 1 to time t:
Pr(C
t
| C
(t1)
) = Pr(C
t
| C
t1
), t = 2, 3, ... (12.1)
Pr(X
t
| X
(t1)
, C
(t)
) = Pr(X
t
| C
t
), t N . (12.2)
The Markov chain is assumed here to be irreducible, aperiodic, and (unless it is stated oth-
erwise) homogeneous. The transition probability matrix (t.p.m.) is denoted by = (γ
ij
),
and the initial distribution, that of C
1
, by the row vector δ
. (All vectors are row vec-
tors.) The stationary distribution of the Markov chain is denoted by δ and if, as is often
assumed, δ
= δ, then the Markov chain is stationary. Unless otherwise indicated, the
“state-dependent probability” p
i
(x) = Pr(X
t
= x | C
t
= i) is assumed to be independent of t,
X
1
X
2
X
3
X
4
C
1
C
2
C
3
C
4
FIGURE 12.1
Directed graph of basic HMM.
269 Hidden Markov Models for Discrete-Valued Time Series
but if necessary this assumption can be relaxed. It is notationally convenient to assemble
the m state–dependent probabilities p
i
(x) into the m × m diagonal matrix
P(x) = diag(p
1
(x), p
2
(x), ..., p
m
(x)).
From this denition, it will be seen that HMMs are “parameter-driven” models, in the
sense used by Cox (1981). They are also state-space models whose latent process is
discrete-valued.
HMMs have a history going back at least to the work of Leonard Baum and co-authors
(see, e.g., Baum et al., 1970; Welch, 2003), although special cases were certainly considered
earlier than that. Such models are well known for their applications in speech recog-
nition (Juang and Rabiner, 1991; Rabiner, 1989) and bioinformatics (Durbin et al., 1998,
Chapter 3), but we focus here on their use as general-purpose models for discrete-valued
time series. HMMs go under a variety of other names or descriptions as well: latent Markov
models, hidden Markov processes, Markov-dependent mixtures, models subject to Markov
regime, and Markov-switching models, the last being more general in that the conditional
independence assumption (12.2) is relaxed.
12.2 Examples of Hidden Markov Time Series Models
We describe here a selection of the ways in which HMMs can be used as models for discrete-
valued time series. Some of these models are univariate, some are multivariate, but all
have the structure specied by (12.1) and (12.2). Furthermore, the process of estimating
parameters by numerical maximization of likelihood will be essentially the same for all;
see Section 12.4.1.
12.2.1 Univariate Models for Discrete Observations
The Poisson–HMM is the simple model in which, conditional on a latent Markov chain {C
t
}
on {1, 2, ..., m}, the observation X
t
has a Poisson distribution with mean λ
i
when C
t
= i.
If the Markov chain is assumed stationary, that is, if δ
= δ, the process {X
t
} is (strictly)
stationary and there are m
2
parameters to be estimated: the m state–dependent means λ
i
and all but one of the transition probabilities in each row of , for example, the m
2
m
off-diagonal entries. If the Markov chain is not assumed to be stationary, then δ
must
also be estimated (see, e.g., Leroux and Puterman, 1992). The number of parameters of an
HMM, being of the order m
2
, limits the number of states that it is feasible to use in practice.
Alternatively, one can reduce the number of parameters by structuring the t.p.m. in some
parsimonious way; see Section 12.8 for references.
If we assume instead that the distribution in state i is binomial with parameters n
t
(the
number of trials at time t)and π
i
(the “success probability”), we have a model for a time
series of bounded counts. The special case n
t
= 1 yields a model for binary time series. An
alternative to the Poisson in the case of unbounded counts that exhibit overdispersion is the
negative binomial. (By “negative binomial” we mean here the version of that distribution
that has the nonnegative integers as support.) Indeed, it is not essential to use distributions
from the same family in all the states of an HMM. We could use a Poisson distribution in
270 Handbook of Discrete-Valued Time Series
one state of a two-state HMM and a negative binomial in the second state. If instead we use
a degenerate distribution at zero in the second state, the resulting marginal distribution of
the observations is zero-inated Poisson.
12.2.2 Multivariate Models: Example of a Bivariate Model for Unbounded Counts
In order to specify an HMM for (say) bivariate observations (X
t
, Y
t
) of unbounded counts,
we need to specify the conditional distribution of (X
t
, Y
t
) given C
t
. One can specify an
appropriate bivariate distribution for X
t
and Y
t
for each of the m states of the Markov chain,
but this is not necessarily an easy task, as there is (for instance) no canonical bivariate
Poisson distribution. A simple and very convenient alternative that works surprisingly
often is to assume that, given C
t
, the random variables X
t
and Y
t
are conditionally indepen-
dent; that is, they are “contemporaneously conditionally independent.” This assumption
leads to (unconditional) dependence between X
t
and Y
t
, very much in the same way as the
assumption of conditional independence in a univariate series, given the Markov chain,
induces unconditional serial dependence.
As an example, assume that, conditional on the Markov chain {C
t
}, X
t
and Y
t
have
independent Poisson distributions and that the pairs (X
t
, Y
t
) are also conditionally indepen-
dent, the latter assumption being the usual “longitudinal conditional independence” of an
HMM. The dependence structure of such a bivariate model is represented by the directed
graph in Figure 12.2. The resulting model consists of serially dependent observations and,
in addition, X
t
and Y
t
are dependent. The marginal distributions of X
t
and Y
t
are mixtures
of Poissons.
12.2.3 Multivariate Models: Series of Multinomial-Like Observations
Suppose that at time t there are n
t
trials, each of which can result in one of q 2 outcomes.
To allow for serial dependence, we can simply assume that, conditional on a latent Markov
chain, the observations are independent multinomial, with the conditional distribution at
time t being multinomial with number of trials n
t
and some vector π
i
of “success proba-
bilities,” i being the current state of the Markov chain. (This vector π
i
can be allowed to
depend on t, if necessary.) The unconditional distribution of the observations at time t is
then a mixture of multinomials.
An important special case of such a model is that in which n
t
= 1 for all t; this provides
a model for categorical series such as DNA base sequences (q = 4) or amino acid sequences
(q = 20). The former example goes back at least to Churchill (1989).
X
1
X
2
X
3
X
4
Y
1
Y
2
Y
3
Y
4
C
1
C
2
C
3
C
4
FIGURE 12.2
Directed graph of bivariate HMM assuming contemporaneous conditional independence.
271 Hidden Markov Models for Discrete-Valued Time Series
12.2.4 Models with Covariates
A simple way to allow for the effect of covariates, including time-dependent covariates, in
(say) a Poisson–HMM, is to leave the Markov chain unchanged but to model the log of
t
λ
i
(the mean in state i at time t) as a linear function of a (row) vector of covariates, y
t
:
ln
t
λ
i
= β
i
y
t
.
This makes it possible, for example, to build time trend or seasonal components into the
state-dependent distributions. A slightly more more difcult, but sometimes more appro-
priate, way of allowing for covariates is to model the entries of as functions of the
covariates, in such a way of course as to respect the row-sum and nonnegativity constraints
on the transition probabilities.
12.3 Properties of HMMs
12.3.1 Computing the Likelihood
An important property of HMMs is that the likelihood is easily computed. Consider
an HMM with initial distribution δ
, t.p.m. and state-dependent distributions P(x).
The likelihood, that is, the joint probability of the observations x
1
, x
2
, ..., x
T
is a
multiple sum:
L
T
= Pr
X
(T)
= x
(T)
m
m
m
=
... Pr
C
(T)
= c
(T)
Pr
X
(T)
= x
(T)
| C
(T)
= c
(T)
.
c
1
=1 c
2
=1 c
T
=1
That is,
m T T
L
T
= Pr(C
1
= c
1
) Pr(C
k
= c
k
| C
k1
= c
k1
) Pr(X
k
= x
k
| C
k
= c
k
).
c
1
,c
2
,...,c
T
=1
k=2 k=1
(12.3)
It can be shown that this sum can be given as a matrix product:
L
T
= δ
P(x
1
)P(x
2
)P(x
3
) ···P(x
T
)1
, (12.4)
the vector 1 being a row vector of ones. (Justication for this and other unproved assertions
can be found in Zucchini and MacDonald (2009).) If the Markov chain is assumed to be
stationary, δ
= δ is computed from ; that is, as the solution of the linear system δ = δ
subject to δ1
= 1.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset