8
Dynamic Bayesian Models for Discrete-Valued
Time Series
Dani Gamerman, Carlos A. Abanto-Valle, Ralph S. Silva, and Thiago G. Martins
CONTENTS
8.1 Introduction...................................................................................165
8.2 MCMC..........................................................................................168
8.2.1 Updating the States...................................................................168
8.2.1.1 Single-Move Update fortheStates......................................169
8.2.1.2 Multimove Update for the States. .......................................169
8.3 Sequential Monte Carlo......................................................................170
8.3.1 ParticleFilter..........................................................................170
8.3.2 Adaptive Random Walk Metropolis Sampling...................................171
8.4 INLA............................................................................................172
8.4.1 INLAMethodology..................................................................173
8.4.2 R-INLA through Examples.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ....................... .174
8.5 Applications...................................................................................175
8.5.1 DeepBrain Stimulation..............................................................175
8.5.1.1 Computation Details.......................................................175
8.5.1.2 Results.......................................................................176
8.5.2 Poliomyelitis in the U.S..............................................................177
8.5.2.1 Computation Details.......................................................180
8.5.2.2 Results.......................................................................180
8.6 Final Remarks.................................................................................181
8A Appendix.......................................................................................182
8A.1 Deep BrainStimulation..............................................................182
8A.2 Poliomyelitis in the U.S..............................................................183
References............................................................................................184
8.1 Introduction
State-space models (SSMs) have been discussed in the literature for a number of decades.
They are models that rely on a decomposition that separates the observational errors from
the temporal evolution. The former usually consists of temporally independent specica-
tions that handle the characteristics of the observational process. The latter is devised to
describe the temporal dependence at a latent, unobserved level through evolution distur-
bances. In the most general form, the observational and evolution disturbances may be
related, but in a typical set-up they are independent. SSMs were originally introduced for
165
166 Handbook of Discrete-Valued Time Series
Gaussian, hence continuous, time series data, but the above decomposition made it easy
to extend them to discrete-valued time series. This chapter describes SSMs with a view
towards their use for such data.
The use of SSM by the statistical time series community has become widespread since the
books of Harvey (1989) and West and Harrison (1997). These books provided an extensive
account of the possibilities of SSM from the classical and Bayesian perspectives, respec-
tively. Another surge of interest has occurred more recently with the development of
sequential Monte Carlo (SMC) methods, allowing for approximate online inference; see
the seminal paper by Gordon et al. (1993).
The basic framework upon which this chapter lies is called the dynamic generalized lin-
ear model (DGLM). It is a special case of SSM, and was introduced by West et al. (1985).
Consider a discrete-valued-time series y
1
, ..., y
T
and let EF(μ, φ) denote a exponential fam-
ily distribution with mean μ and variance φc(μ), for some mean function c. The SSM
decomposition of DGLM is given, for t = 1, ..., T, by the equations
Observation equation: y
t
| x
t
, θ EF(μ
t
, φ), (8.1)
Link function: g(μ
t
) = z
t
x
t
, (8.2)
System equation: x
t
= G
t
x
t1
+ w
t
, where w
t
| θ N(0, W), (8.3)
where z
t
is a known vector (possibly including covariates) at time t, x
t
is a time-dependent
latent state at time t,and θ is a vector of hyperparameters including φ and unknown
components of G
t
and W. The model is completed with a prior specication for the ini-
tial latent state x
0
. A Bayesian formulation would also require a prior distribution for the
hyperparameter θ. The above model formulation considers only linear models both at the
link relation and the system evolution levels. A non-Gaussian evolution with nice integra-
tion properties was proposed by Gamerman et al. (2013) to replace (8.3). It includes a few
discrete observational models but is not as general as the above formulation.
Usual features of time series can be represented in the above formulation. For example,
11
local linear trends are specied with z
t
= (1, 0)
, G
t
=
01
and x
t
= (α
t
, β
t
).Inthis
case, α represents the local level of the series and β represents the local growth in the series.
Another common feature of time series is seasonality. There are a few related ways to rep-
resent seasonal patterns in time series. Perhaps the simplest representation is the structural
form of Harvey (1989) where the seasonal effects s
t
are stochastically related via
s
t
=−(s
t1
+ s
t2
+···+s
tp+1
) + η
t
, t, (8.4)
for seasonal cycles of length p. Deterministic or static seasonal terms are obtained in the
limiting case of η
t
=0, a.s., thus implying that
i
p
=1
s
ti
=0, for all p. Evolution 8.3 is
recovered by forming the latent component x
t
= (s
t
, s
t1
, ··· , s
tp+1
)
with z
t
= (1, 0
p1
)
,
1
0
G
t
=
p1
and w
t
= (η
t
,0
p1
), where I
m
,1
m
,and0
m
are the identity matrix, vector
I
p1
0
p1
of 1s and vector of 0s of order m, respectively.
167 Dynamic Bayesian Models for Discrete-Valued Time Series
By far, the most common discrete-valued specications are the Poisson and binomial
distributions. The Poisson distribution is usually assumed in the analysis of time series of
counts. The most popular model for time series of counts is the log-linear dynamic model
given by
Observation equation: y
t
| x
t
Poisson(μ
t
),for t = 1, ..., T (8.5)
Link function: log(μ
t
) = z
t
x
t
,for t = 1, ..., T, (8.6)
with system equation (8.3). For binomial-type data, the most popular model is the dynamic
logistic regression given by
Observation equation: y
t
| x
t
, θ Bin(n
t
, π
t
),for t = 1, ..., T (8.7)
Link function: logit(π
t
) = z
t
x
t
,for t = 1, ..., T, (8.8)
with system equation (8.3). Similar models are obtained if the logit link is replaced by the
probit or complementary log–log links.
A number of extensions/variations can be contemplated:
Nonlinear models can be considered at the link relation (8.2) and/or at the system
evolution (8.3);
Some components of the latent state x
t
may be xed over time. The generalized
linear models (GLM) (Nelder and Wedderburn, 1972) are obtained in the static,
limiting case that all components of x
t
are xed;
The observational equation (8.1) may be robustied to account for overdispersion
(Gamerman, 1997);
The link function (8.2) may be generalized to allow for more exible forms via
parametric (Abanto-Valle and Dey, 2014) or nonparametric (Mallick and Gelfand,
1994) mixtures; and
The system equation disturbances may be generalized by replacement of
Gaussianity by robustied forms (Meinhold and Singpurwalla, 1989) or by skew
forms (Valdebenito et al., 2015).
Data overdispersion is frequently encountered in discrete-valued time series observed in
human-related studies. It can be accommodated in the DGLM formulation (8.1) through
(8.3) via additional random components in the link functions (8.6) and (8.8). These addi-
tional random terms cause extra variability at the observational level, forcing a data
dispersion larger than that prescribed by the canonical model. These terms may be included
in conjugate fashion, thus rendering negative binomial and beta-binomial to replace
Poisson and binomial distributions, respectively leading to hierarchical GLM (Lee and
Nelder, 1999). Alternatively, random terms may be added to the linear predictors z
t
x
t
in the
link equations (Ferreira and Gamerman, 2000). The resulting distributions are also overdis-
persed but no longer available analytically in closed forms. Their main features resemble
those of the corresponding negative binomial and beta-binomial distributions, for N(0, σ
2
)
random terms.
Inference can be performed in two different ways: sequentially or in a single block.
From a Bayesian perspective, these forms translate into obtaining the sequence of
distributions of [(x
t
, θ) | y
t
],for t = 1, ..., T or [(x
1
, ..., x
T
, θ) | y
T
], respectively, where
168 Handbook of Discrete-Valued Time Series
y
t
={y
0
, y
1
, ..., y
t
} and y
0
represents the initial information. The sequential approach is
obtained via iterated use of Bayes’ theorem
p(x
t
, θ | y
t
) p(y
t
| x
t
, θ) p(x
t
| y
t1
, θ)p(θ | y
t1
), (8.9)
where the rst term on the right side is given by (8.1). The second term on the right side is
obtained iteratively via
p x
t
| y
t1
, θ = p x
t
| x
t1
, y
t1
, θ p x
t1
| y
t1
, θ dx
t1
, (8.10)
where the rst term in the integrand is given by (8.3).
Single block inference is performed by a single pass of Bayes’ theorem as
T T
p x
0
, x
1
, ..., x
T
, θ | y
T
p(y
t
| x
t
, θ) ×
p(x
t
| x
t1
, θ) × p(x
0
)p(θ), (8.11)
i=1 i=1
where the terms in the products above are respectively given by (8.1) and (8.3).
Inference may also be performed from a classical perspective. In this case, a likelihood
approach would use the above posterior distribution as a penalized likelihood, probably
with removal of the prior p(θ). This route is pursued in Durbin and Koopman (2001) (see
also Davis and Dunsmuir [2015; Chapter 6 in this volume]). This chapter will concentrate
on the Bayesian paradigm.
The resulting distributions are untractable analytically for all cases discussed above and
approximations must be used. The following sections describe some of the techniques that
are currently being used to approximate the required distributions. They are Markov chain
Monte Carlo (MCMC), SMC (or particle lters), and integrated nested Laplace approxima-
tions (INLA). These techniques do not exhaust the range of possibilities for approximations,
but are the most widely used techniques nowadays. Finally, the techniques are applied to
real datasets under a variety of model formulations in order to illustrate the usefulness of
the SSM formulation.
8.2 MCMC
We describe MCMC methods in the context of the general SSM given in (8.1)–(8.3).
A detailed review on this subject can be found in Fearnhead (2011) and Migon et al.
(2005). From the Bayesian perspective, inference in a general SSM targets the joint pos-
terior distribution of parameters and hidden states, p(θ, x
T
| y
T
), which is given by (8.11).
A Markov chain whose state is (θ, x
T
) and whose stationary distribution is the joint pos-
terior distribution p(θ, x
T
| y
T
) is subsequent. A realization of this chain is generated until
convergence is reached. After convergence, the following iterations of the chain can be used
to form a sample from the posterior distribution. The Gibbs sampler, iteratively drawing
samples from p(x
T
| y
T
, θ) and p(θ | x
T
, y
T
) is the most popular method to sample from
such a posterior distribution. In practice, sampling from p(θ | x
T
, y
T
) is often easy, whereas
designing a sampler for p(x
T
| y
T
, θ) is trickier due to the high posterior correlation that
usually occurs between the states. Next, we will describe approaches to sample from states.
169 Dynamic Bayesian Models for Discrete-Valued Time Series
8.2.1 Updating the States
The simplest procedure is to update the components of the states x
T
oneattimeinasingle-
move fashion (Carlin et al., 1992; Geweke and Tanizaki, 2001). However, due to the severe
correlation between states, such a sampler may lead to slow mixing. In such cases it is better
to update the states in a multimove fashion as blocks of states x
r,s
= (x
r
, x
r+1
, ..., x
s
)
,or
update the whole state process x
T
(Shephard and Pitt, 1997; Carter and Kohn, 1994, 1996).
8.2.1.1 Single-Move Update for the States
Carlin et al. (1992) and Geweke and Tanizaki (2001) introduced the Gibbs sampler and the
Metropolis–Hastings algorithms to perform inference for nonnormal and nonlinear SSM
in a single-move fashion. For sampling states, a sequential sampler that updates each state
conditioning on the rest of the states is used. Such an approach is easy to implement and
due to the Markovian evolution of the states, the conditional distribution of each state given
all the others reduces to conditioning only on its two adjacent states:
p(x
t
| y
t
, x
t1
, x
t+1
, θ)
p(y
t
| x
t
, θ)p(x
t+1
| x
t
, θ)p(x
t
| x
t1
, θ) t < T
(8.12)
p(y
t
| θ, x
t
)p(x
t
| x
t1
, θ) t = T (end point).
In some situations, we can simulate directly from the full conditional distribution, and
such moves will always be accepted. Where this is not possible, a Metropolis–Hastings
step within the Gibbs sampler can often be implemented. Geweke and Tanizaki (2001) give
a detailed discussion of several proposals in this situation.
8.2.1.2 Multimove Update for the States
While single-move samplers are easy to implement, the resulting MCMC algorithms can
mix slowly if there is strong dependence in the state process. Updating the states in a mul-
timove fashion could be an alternative approach to overcome this problem. Ideally, we
would update the whole state process in one move. The simulation smoother of de Jong
and Shephard (1995) can be used to sample the states.
In situations where it is not possible to update the whole state process, Shephard
and Pitt (1997) and Watanabe (2004) propose sampling random blocks of the distur-
bances w
r+1,s
= (w
r+1
, ..., w
s
)
(equivalently x
r+1,s
using as proposal transition density
a second order Taylor expansion of l
t
= log p(y
t
| ζ
t
) in the full conditional of
w
t+1,s
, where h(ζ
t
) = d
t
= z
t
x
t
. The proposal density is then the multivariate normal
with pseudo-observations
y
ˆ
t
= z
t
x
t
+
V
ˆ
t
l
t
(
ζ
ˆ
t
) and V
t
=− l

(
ζ
ˆ
t
),for t = r + 1, ..., s 1
and t = T. For t = s < T, we have
y
ˆ
t
=
V
ˆ
t
z
t
{l
(
ζ
ˆ
t
l

(
ζ
ˆ
t
)z
x
ˆ
t
}+ G
t
+1
W
1
and
t
t+1
x
t+1
1
V
t
=
G
t+1
W
t
+
1
1
G
t+1
l

(
ζ
ˆ
t
)z
t
z
t
. Then the linear SSM with pseudo-observations
y
ˆ
t
is
dened as
z
t
x
t
+ ν
t
t = r + 1, ..., s 1, and t = T
y
ˆ
t
=
x
t
+ ν
t
t = s < T
(8.13)
x
t
= G
t
x
t1
+ w
t
, t,
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset