12: Hidden Markov Models for Discrete-Valued Time Series (1/4)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Hidden Markov Models for Discrete-Valued

Time Series

Iain L. MacDonald and Walter Zucchini

CONTENTS

12.1 Structure and Components of a Hidden Markov Model. . . ............................267

12.2 ExamplesofHidden MarkovTime SeriesModels.......................................269

12.2.1 Univariate ModelsforDiscrete Observations....................................269

12.2.2 Multivariate Models: Example of a Bivariate Model for

Unbounded Counts..................................................................270

12.2.3 MultivariateModels: Series of Multinomial-Like Observations...............270

12.2.4 Models with Covariates. ... .. ... ... .. ... ... .. . .. ... .. . .. ... .. . .. ... .. . .. ... .. . .. ... .271

12.3 Properties of HMMs..........................................................................271

12.3.1 Computing theLikelihood..........................................................271

12.3.2 Marginal and Joint Distributions...................................................272

12.3.3 Moments...............................................................................273

12.4 Parameter Estimationby MaximumLikelihood.........................................274

12.4.1 Maximum Likelihood via Direct Numerical Maximization....................274

12.4.2 Maximum Likelihood viatheEM Algorithm....................................275

12.4.3 Remarks................................................................................276

12.5 Forecasting and Decoding...................................................................277

12.5.1 Forecast Distributions................................................................277

12.5.2 State Probabilities, State Prediction, and Decoding. . . ... . . ... . . ... . . .. . . . .. . . ..277

12.6 ModelSelection and Checking..............................................................278

12.6.1 AIC and BIC...........................................................................278

12.6.2 Model Checking by Pseudo-Residuals............................................279

12.7 An Example: WeeklySalesofa SoapProduct............................................279

12.8 Extensions and Concluding Remarks......................................................284

References............................................................................................285

12.1 Structure and Components of a Hidden Markov Model

In the search for useful models for discrete-valued time series, one possible approach is to

take a standard model for continuous-valued series, for example, a Gaussian autoregres-

sive moving-average process, and to modify or adapt it in order to allow for the discrete

nature of the data. Another approach, the one followed here, is to start from a model for dis-

crete data which assumes independence and then to relax the independence assumption by

267

268 Handbook of Discrete-Valued Time Series

allowing the distribution of the observations to switch among several possibilities accord-

ing to a latent Markov chain. To take a simple example, a sequence of independent Poisson

random variables with common mean would be inappropriate for a series of unbounded

counts displaying signicant autocorrelation. But a model that allowed the observations

to be Poisson with mean either λ

or λ

, the choice being made by a discrete-time Markov

chain, might well be adequate; a model of this kind, it will be seen, allows for both serial

dependence and overdispersion relative to the Poisson.

Such a latent Markov chain is a simple device by which any model X

(t = 1, 2, ..., T)for

a series of independent observations may be generalized to yield a model that allows for

serial dependence. A hidden Markov model (HMM) relaxes the assumption of indepen-

dence; however, we assume that the observations are conditionally independent,giventhe

states occupied by an unobserved nite state-space Markov chain. Here, we conne our-

selves to discrete data, but the observations could more generally be discrete or continuous,

a mixture of those two, univariate, multivariate, circular in nature, spatial images—any

kind of observation X

, in fact, for which it is possible to propose a probabilistic model.

The case of discrete data also includes several possibilities: univariate unbounded counts

(i.e., Poisson-like observations), univariate bounded counts (i.e., binomial-like observa-

tions) including binary observations, observations of categories, and multivariate versions

of these.

HMMs consist of two parts: rst, an unobserved “parameter process” {C

: t ∈ N } that

is a Markov chain on {1, 2, ..., m}, and second, an observed process {X

: t ∈ N } such

that the distribution of X

is determined only by the current state C

, irrespective of all

previous states and observations. (The symbol N denotes the natural numbers.) This struc-

ture is represented by the directed graph in Figure 12.1 and summarized by the following

equations, in which C

(t)

and X

(t)

denote the “histories” of the processes {C

} and {X

} from

time 1 to time t:

Pr(C

| C

(t−1)

) = Pr(C

| C

t−1

), t = 2, 3, ... (12.1)

Pr(X

| X

(t−1)

, C

(t)

) = Pr(X

| C

), t ∈ N . (12.2)

The Markov chain is assumed here to be irreducible, aperiodic, and (unless it is stated oth-

erwise) homogeneous. The transition probability matrix (t.p.m.) is denoted by � = (γ

and the initial distribution, that of C

, by the row vector δ

∗

. (All vectors are row vec-

tors.) The stationary distribution of the Markov chain is denoted by δ and if, as is often

assumed, δ

∗

= δ, then the Markov chain is stationary. Unless otherwise indicated, the

“state-dependent probability” p

(x) = Pr(X

= x | C

= i) is assumed to be independent of t,

FIGURE 12.1

Directed graph of basic HMM.

269 Hidden Markov Models for Discrete-Valued Time Series

but if necessary this assumption can be relaxed. It is notationally convenient to assemble

the m state–dependent probabilities p

(x) into the m × m diagonal matrix

P(x) = diag(p

(x), p

(x), ..., p

(x)).

From this denition, it will be seen that HMMs are “parameter-driven” models, in the

sense used by Cox (1981). They are also state-space models whose latent process is

discrete-valued.

HMMs have a history going back at least to the work of Leonard Baum and co-authors

(see, e.g., Baum et al., 1970; Welch, 2003), although special cases were certainly considered

earlier than that. Such models are well known for their applications in speech recog-

nition (Juang and Rabiner, 1991; Rabiner, 1989) and bioinformatics (Durbin et al., 1998,

Chapter 3), but we focus here on their use as general-purpose models for discrete-valued

time series. HMMs go under a variety of other names or descriptions as well: latent Markov

models, hidden Markov processes, Markov-dependent mixtures, models subject to Markov

regime, and Markov-switching models, the last being more general in that the conditional

independence assumption (12.2) is relaxed.

12.2 Examples of Hidden Markov Time Series Models

We describe here a selection of the ways in which HMMs can be used as models for discrete-

valued time series. Some of these models are univariate, some are multivariate, but all

have the structure specied by (12.1) and (12.2). Furthermore, the process of estimating

parameters by numerical maximization of likelihood will be essentially the same for all;

see Section 12.4.1.

12.2.1 Univariate Models for Discrete Observations

The Poisson–HMM is the simple model in which, conditional on a latent Markov chain {C

}

on {1, 2, ..., m}, the observation X

has a Poisson distribution with mean λ

when C

= i.

If the Markov chain is assumed stationary, that is, if δ

∗

= δ, the process {X

} is (strictly)

stationary and there are m

parameters to be estimated: the m state–dependent means λ

and all but one of the transition probabilities in each row of �, for example, the m

− m

off-diagonal entries. If the Markov chain is not assumed to be stationary, then δ

∗

must

also be estimated (see, e.g., Leroux and Puterman, 1992). The number of parameters of an

HMM, being of the order m

, limits the number of states that it is feasible to use in practice.

Alternatively, one can reduce the number of parameters by structuring the t.p.m. in some

parsimonious way; see Section 12.8 for references.

If we assume instead that the distribution in state i is binomial with parameters n

(the

number of trials at time t)and π

(the “success probability”), we have a model for a time

series of bounded counts. The special case n

= 1 yields a model for binary time series. An

alternative to the Poisson in the case of unbounded counts that exhibit overdispersion is the

negative binomial. (By “negative binomial” we mean here the version of that distribution

that has the nonnegative integers as support.) Indeed, it is not essential to use distributions

from the same family in all the states of an HMM. We could use a Poisson distribution in

270 Handbook of Discrete-Valued Time Series

one state of a two-state HMM and a negative binomial in the second state. If instead we use

a degenerate distribution at zero in the second state, the resulting marginal distribution of

the observations is zero-inated Poisson.

12.2.2 Multivariate Models: Example of a Bivariate Model for Unbounded Counts

In order to specify an HMM for (say) bivariate observations (X

, Y

) of unbounded counts,

we need to specify the conditional distribution of (X

, Y

) given C

. One can specify an

appropriate bivariate distribution for X

and Y

for each of the m states of the Markov chain,

but this is not necessarily an easy task, as there is (for instance) no canonical bivariate

Poisson distribution. A simple and very convenient alternative that works surprisingly

often is to assume that, given C

, the random variables X

and Y

are conditionally indepen-

dent; that is, they are “contemporaneously conditionally independent.” This assumption

leads to (unconditional) dependence between X

and Y

, very much in the same way as the

assumption of conditional independence in a univariate series, given the Markov chain,

induces unconditional serial dependence.

As an example, assume that, conditional on the Markov chain {C

}, X

and Y

have

independent Poisson distributions and that the pairs (X

, Y

) are also conditionally indepen-

dent, the latter assumption being the usual “longitudinal conditional independence” of an

HMM. The dependence structure of such a bivariate model is represented by the directed

graph in Figure 12.2. The resulting model consists of serially dependent observations and,

in addition, X

and Y

are dependent. The marginal distributions of X

and Y

are mixtures

of Poissons.

12.2.3 Multivariate Models: Series of Multinomial-Like Observations

Suppose that at time t there are n

trials, each of which can result in one of q ≥ 2 outcomes.

To allow for serial dependence, we can simply assume that, conditional on a latent Markov

chain, the observations are independent multinomial, with the conditional distribution at

time t being multinomial with number of trials n

and some vector π

of “success proba-

bilities,” i being the current state of the Markov chain. (This vector π

can be allowed to

depend on t, if necessary.) The unconditional distribution of the observations at time t is

then a mixture of multinomials.

An important special case of such a model is that in which n

= 1 for all t; this provides

a model for categorical series such as DNA base sequences (q = 4) or amino acid sequences

(q = 20). The former example goes back at least to Churchill (1989).

FIGURE 12.2

Directed graph of bivariate HMM assuming contemporaneous conditional independence.





  

  

271 Hidden Markov Models for Discrete-Valued Time Series

12.2.4 Models with Covariates

A simple way to allow for the effect of covariates, including time-dependent covariates, in

(say) a Poisson–HMM, is to leave the Markov chain unchanged but to model the log of

(the mean in state i at time t) as a linear function of a (row) vector of covariates, y

= β



This makes it possible, for example, to build time trend or seasonal components into the

state-dependent distributions. A slightly more more difcult, but sometimes more appro-

priate, way of allowing for covariates is to model the entries of � as functions of the

covariates, in such a way of course as to respect the row-sum and nonnegativity constraints

on the transition probabilities.

12.3 Properties of HMMs

12.3.1 Computing the Likelihood

An important property of HMMs is that the likelihood is easily computed. Consider

an HMM with initial distribution δ

∗

, t.p.m. � and state-dependent distributions P(x).

The likelihood, that is, the joint probability of the observations x

, x

, ..., x

is a

multiple sum:

= Pr

(T)

= x

(T)









... Pr

(T)

= c

(T)

= x

(T)

| C

(T)

= c

(T)

=1 c

That is,

m T T

= Pr(C

= c

) Pr(C

= c

| C

k−1

= c

k−1

) Pr(X

= x

| C

= c

,...,c

k=2 k=1

(12.3)

It can be shown that this sum can be given as a matrix product:

= δ

∗

P(x

)�P(x

) ···�P(x



, (12.4)

the vector 1 being a row vector of ones. (Justication for this and other unproved assertions

can be found in Zucchini and MacDonald (2009).) If the Markov chain is assumed to be

stationary, δ

∗

= δ is computed from �; that is, as the solution of the linear system δ� = δ

subject to δ1



= 1.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 12: Hidden Markov Models for Discrete-Valued Time Series (1/4)

Create new playlist

Sign In

Sign Up

Table of Contents for
12: Hidden Markov Models for Discrete-Valued Time Series (1/4)