282
Handbook of Discrete-Valued Time Series
One-state model Two-state model
4
4
3
3
2
2
1
1
0
0
−1 −1
−2 −2
−3 −3
−4 −4
−4 −3 −2 −1 0 1 2 3 4
−4 −3 −2 −1 0 1 2 3 4
Three-state model
4
3
2
1
0
−1
−2
−3
−4
−4 −3 −2 −1 0 1 2 3 4
FIGURE 12.5
Soap sales: normal QQ plot of the (mid-)quantile residuals for the one-, two- and three-state stationary models.
0.864 0.117 0.019
=
0.445 0.538 0.017
,
0.000 0.298 0.702
and stationary distribution
δ = (0.722, 0.220, 0.058).
The mean (5.42) and variance (14.72) implied by the model certainly reect the observed
overdispersion. The implied ACF (see Section 12.3.3) is given by
ρ
k
= 0.5392 × 0.6823
k
+ 0.0926 × 0.4220
k
.
(The non-unit eigenvalues of
are 0.6823 and 0.4220.) This ACF is close to the sample ACF
for the rst four lags; see Table 12.3.
Global decoding and local decoding were carried out, and the results are shown in
Figure 12.6. The 7 weeks (out of 242) in which global decoding and local decoding differ
are indicated there.
283
6
Hidden Markov Models for Discrete-Valued Time Series
TABLE 12.3
Three-State Stationary Poisson–HMM for Soap Sales Data: Comparison of Sample and Model
Autocorrelations
Lag 12345
Sample ACF 0.392 0.250 0.178 0.136 0.038 0.044
Model ACF 0.407 0.268 0.178 0.120 0.081 0.055
3
2
1
0 50 100 150 200 250
Week
FIGURE 12.6
Global decoding of soap sales: the sequence of states, that is, a posteriori the most likely. The black dots indicate
the seven occasions on which local decoding led to different states being identied as most likely.
Figure 12.7 displays forecast distributions under this model for weekly sales 1 and 2
weeks into the future, plus the corresponding stationary distribution.
Finally, under this model the state prediction probabilities for the next 3 weeks, com-
pared to the estimated stationary distribution
δ, are indicated here.
Week State 1 2 3
243 0.844 0.138 0.019
244 0.790 0.178 0.031
245 0.763 0.198 0.040
δ 0.722 0.220 0.058
Probability
0.20
0.15
0.10
0.05
0.00
0 5 10 15 20
Count
FIGURE 12.7
Forecast distributions for weekly counts of soap sales: one-step-ahead (left vertical lines), two-step-ahead (middle
vertical lines), stationary distribution (right vertical lines).
284 Handbook of Discrete-Valued Time Series
In this case, the convergence to the stationary distribution is relatively fast, which is not
surprising as the second largest eigenvalue of
(0.6823) is not close to 1.
12.8 Extensions and Concluding Remarks
One of the principal advantages of the use of HMMs as time series models, in par-
ticular if they are tted by direct numerical maximization of likelihood, is the ease of
extending or adapting the basic models in order to accommodate known or suspected
special features of the data. We have not here dwelt on the many variations that are
possible, such as the modeling of additional dependencies at observation level, at latent
process level, or between these levels. A selection of possibilities is given by Zucchini and
MacDonald (2009, Section 8.6). An example of the last category of additional dependencies
is the model of Zucchini et al. (2008) for a binary time series {X
t
} of animal feeding behavior,
which is depicted in Figure 12.8. In that model only the feeding behavior {X
t
} is observed,
and the “nutrient levels” {N
t
} (an exponentially smoothed version of feeding behavior) are
permitted to inuence the transition probabilities governing the latent process {C
t
}.
Other important topics not discussed in this chapter, or described only briey, are the
use of HMMs as models for longitudinal data, that is, multiple time series; the incorpora-
tion of covariates; the use of Bayesian estimation methods; the structuring of the t.p.m. to
reduce the number of parameters required by an HMM; and the construction of HMMs that
(accurately) approximate less tractable models having a continuous-valued latent Markov
process.
For HMMs as models for longitudinal data, see Altman (2007), Maruotti (2011, 2015),
Schliehe-Diecks et al. (2012), and Bartolucci et al. (2013). For examples of models with
covariates, see Zucchini and MacDonald (2009, Chapter 14). For Bayesian methods, see
the works cited in Section 12.4. For structuring of the transition probability matrix, see, for
example, Cooper and Lipsitch (2004) and Langrock (2011). For discretization of continuous-
valued latent processes and the resulting application of HMM methods, see Langrock
(2011).
To conclude, we suggest that many discrete-valued time series can be usefully modeled
by HMMs or variations thereof, and the models relatively easily tted by direct numeri-
cal maximization of likelihood; EM and Bayesian methods are obvious alternatives. The
unity—across various types of discrete data—of model structure, and of techniques for
N
0
N
1
N
2
N
3
X
1
X
2
X
3
C
1
C
2
C
3
FIGURE 12.8
Directed graph of animal feeding model of Zucchini et al. (2008).
285 Hidden Markov Models for Discrete-Valued Time Series
estimation, forecasting, and diagnostic checking, makes HMMs a promising set of models
for a wide variety of discrete-valued time series.
Acknowledgments
The James M. Kilts Center, University of Chicago Booth School of Business, is thanked for
making available the data analyzed in Section 12.7. The reviewer is thanked for constructive
comments and suggestions.
References
Altman, R. M. (2007). Mixed hidden Markov models: An extension of the hidden Markov model to
the longitudinal data setting. Journal of the American Statistical Association, 102:201–210.
Bartolucci, F., Farcomeni, A., and Pennoni, F. (2013). Latent Markov Models for Longitudinal Data.
Chapman & Hall/CRC Press, Boca Raton, FL.
Baum, L. E., Petrie, T., Soules, G., and Weiss, N. (1970). A maximization technique occurring in the
statistical analysis of probabilistic functions of Markov chains. Annals of Mathematical Statistics,
41:164–171.
Bulla, J. and Berzel, A. (2008). Computational issues in parameter estimation for stationary hidden
Markov models. Computational Statistics, 23:1–18.
Churchill, G. A. (1989). Stochastic models for heterogeneous DNA sequences. Bulletin of Mathematical
Biology, 51:79–94.
Cooper, B. and Lipsitch, M. (2004). The analysis of hospital infection data using hidden Markov
models. Biostatistics, 5:223–237.
Cox, D. R. (1981). Statistical analysis of time series: Some recent develpoments. Scandivanian Journal
of Statistics, 8:93–115.
Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data
via the EM algorithm (with discussion). Journal of the Royal Statistical Society Series B, 39:1–38.
Durbin, R., Eddy, S. R., Krogh, A., and Mitchison, G. (1998). Biological Sequence Analysis: Probabilistic
Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge, U.K.
Frühwirth-Schnatter, S. (2006). Finite Mixture and Markov Switching Models. Springer, New York.
Juang, B. H. and Rabiner, L. R. (1991). Hidden Markov models for speech recognition. Technometrics,
33:251–272.
Langrock, R. (2011). Some applications of nonlinear and non-Gaussian state-space modelling by
means of hidden Markov models. Journal of Applied Statistics, 38(12):2955–2970.
Leroux, B. G. and Puterman, M. L. (1992). Maximum-penalized-likelihood estimation for indepen-
dent and Markov-dependent mixture models. Biometrics, 48(2):545–558.
Little, R. J. A. (2009). Selection and pattern-mixture models. In Fitzmaurice, G., Davidian, M.,
Verbeke, G., and Molenberghs, G., editors, Longitudinal Data Analysis, pp. 409–431. Chapman &
Hall/CRC, Boca Raton, FL.
MacDonald, I. L. (2014). Numerical maximisation of likelihood: A neglected alternative to EM?
International Statistical Review, 82(2):296–308.
Maruotti, A. (2011). Mixed hidden Markov models for longitudinal data: An overview. International
Statistical Review, 79(3):427–454.
Maruotti, A. (2015). Handling non-ignorable dropouts in longitudinal data: A conditional model
based on a latent Markov heterogeneity structure. TEST, 24:84–109.
286 Handbook of Discrete-Valued Time Series
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech
recognition. IEEE Transactions on Information Theory, 77(2):257–284.
Rydén, T. (2008). EM versus Markov chain Monte Carlo for estimation of hidden Markov models: A
computational perspective (with discussion). Bayesian Analysis, 3(4):659–688.
Schliehe-Diecks, S., Kappeler, P., and Langrock, R. (2012). On the application of mixed hidden Markov
models to multiple behavioural time series. Interface Focus, 2:180–189.
Scott, S. L. (2002). Bayesian methods for hidden Markov models: Recursive computing in the
21st century. Journal of the American Statistical Association, 97:337–351.
University of Chicago Booth School of Business (2015). http://research.chicagobooth.edu/kilts/
marketing-databases/dominicks. Accessed date April 28, 2015.
Viterbi, A. J. (1967). Error bounds for convolutional codes and an asymptotically optimal decoding
algorithm. IEEE Transactions on Information Theory, 13:260–269.
Viterbi, A. J. (2006). A personal history of the Viterbi algorithm. IEEE Signal Processing Magazine,
23:120–142.
Welch, L. R. (2003). Hidden Markov models and the Baum–Welch algorithm. IEEE Information Theory
Society Newsletter, 53(1): 10–13.
Zucchini, W. (2000). An introduction to model selection. Journal of Mathematical Psychology,
44(1):41–61.
Zucchini, W. and MacDonald, I. L. (2009). Hidden Markov Models for Time Series: An Introduction Using
R. Chapman & Hall/CRC, London, U.K./Boca Raton, FL.
Zucchini, W., Raubenheimer, D., and MacDonald, I. L. (2008). Modeling time series of animal
behavior by means of a latent-state model with feedback. Biometrics, 64:807–815.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset