276 Handbook of Discrete-Valued Time Series
and
v
jk
(t) = Pr(C
t−1
= j, C
t
= k | x
(T)
) = α
t−1
(j) γ
jk
p
k
(x
t
) β
t
(k)/L
T
. (12.7)
• MstepHaving replaced v
jk
(t) and u
j
(t) by
v
jk
(t) and
u
j
(t), maximize the CDLL,
expression (12.5), with respect to the three sets of parameters: the initial distri-
bution δ
∗
, the t.p.m. �, and the parameters of the state-dependent distributions
(e.g., λ
1
, ... , λ
m
in the case of a simple Poisson–HMM).
Examination of (12.5) reveals that the M step splits here into three separate
maximizations.
m
1. Set δ
j
∗
=
u
j
(1)/
j=1
u
j
(1) =
u
j
(1).
m
T
2. Set γ
jk
= f
jk
/
k=1
f
jk
, where f
jk
=
t=2
v
jk
(t).
3. The maximization of the third term may be easy or difcult, depending on the
nature of the state-dependent distributions assumed. It is essentially the standard
problem of maximum likelihood estimation for the distributions concerned. In the
case of Poisson and normal distributions, closed-form solutions are available. In
some other cases, for example, the gamma distributions and the negative binomial,
numerical maximization—or some modication of EM—will be necessary to carry
out this part of the M step.
One starts by giving initial estimates of the model parameters. The E and M steps are
then repeated until convergence is achieved. As in the case of likelihood evaluation via
the forward recursion, precautions have to be taken to avoid underow.
12.4.3 Remarks
It seems clear from Sections 12.4.1 and 12.4.2 that, for such an HMM, direct numerical max-
imization of the observed data likelihood is at least conceptually simpler than EM. To carry
out the former we need only a likelihood evaluator and a general-purpose optimizer such
as nlm in R. But in order to apply EM, we rst compute the forward probabilities (which
are all that is needed to evaluate the likelihood) and then do considerably more, includ-
ing computation of the backward probabilities and the quantities
u
j
(t),
v
jk
(t),and f
jk
.In
both methods, it is possible to become trapped in a local maximum which is not the global
maximum, although EM seems less likely to fail in this way than direct numerical maxi-
mization; see Bulla and Berzel (2008, Table 2). However, in HMMs there are in our view
several advantages of numerical maximization over EM, as there seem to be in some other
contexts as well; see MacDonald (2014).
The assumption of stationarity often seems appropriate in time series applications.
However, for HMMs tted by EM it is almost never assumed that the underlying (homo-
geneous) Markov chain is stationary, that is, that δ
∗
= δ. (The MLE of δ
∗
then turns out to
be a unit vector: one element is 1 and the others 0.) One obvious reason for not assuming
stationarity is convenience. For stationary series, there is no explicit formula for the matrix
� which maximizes term 1 + term 2 of the CDLL in the M step; see, for example, Bulla
and Berzel (2008, Section 2.3). It is not difcult to carry out the maximization numerically,
but that implies a numerical optimization within each M step, that is, a maximization loop