8: Dynamic Bayesian Models for Discrete-Valued Time Series (1/5)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Dynamic Bayesian Models for Discrete-Valued

Time Series

Dani Gamerman, Carlos A. Abanto-Valle, Ralph S. Silva, and Thiago G. Martins

CONTENTS

8.1 Introduction...................................................................................165

8.2 MCMC..........................................................................................168

8.2.1 Updating the States...................................................................168

8.2.1.1 Single-Move Update fortheStates......................................169

8.2.1.2 Multimove Update for the States. .......................................169

8.3 Sequential Monte Carlo......................................................................170

8.3.1 ParticleFilter..........................................................................170

8.3.2 Adaptive Random Walk Metropolis Sampling...................................171

8.4 INLA............................................................................................172

8.4.1 INLAMethodology..................................................................173

8.4.2 R-INLA through Examples.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ....................... .174

8.5 Applications...................................................................................175

8.5.1 DeepBrain Stimulation..............................................................175

8.5.1.1 Computation Details.......................................................175

8.5.1.2 Results.......................................................................176

8.5.2 Poliomyelitis in the U.S..............................................................177

8.5.2.1 Computation Details.......................................................180

8.5.2.2 Results.......................................................................180

8.6 Final Remarks.................................................................................181

8A Appendix.......................................................................................182

8A.1 Deep BrainStimulation..............................................................182

8A.2 Poliomyelitis in the U.S..............................................................183

References............................................................................................184

8.1 Introduction

State-space models (SSMs) have been discussed in the literature for a number of decades.

They are models that rely on a decomposition that separates the observational errors from

the temporal evolution. The former usually consists of temporally independent specica-

tions that handle the characteristics of the observational process. The latter is devised to

describe the temporal dependence at a latent, unobserved level through evolution distur-

bances. In the most general form, the observational and evolution disturbances may be

related, but in a typical set-up they are independent. SSMs were originally introduced for

165









166 Handbook of Discrete-Valued Time Series

Gaussian, hence continuous, time series data, but the above decomposition made it easy

to extend them to discrete-valued time series. This chapter describes SSMs with a view

towards their use for such data.

The use of SSM by the statistical time series community has become widespread since the

books of Harvey (1989) and West and Harrison (1997). These books provided an extensive

account of the possibilities of SSM from the classical and Bayesian perspectives, respec-

tively. Another surge of interest has occurred more recently with the development of

sequential Monte Carlo (SMC) methods, allowing for approximate online inference; see

the seminal paper by Gordon et al. (1993).

The basic framework upon which this chapter lies is called the dynamic generalized lin-

ear model (DGLM). It is a special case of SSM, and was introduced by West et al. (1985).

Consider a discrete-valued-time series y

, ..., y

and let EF(μ, φ) denote a exponential fam-

ily distribution with mean μ and variance φc(μ), for some mean function c. The SSM

decomposition of DGLM is given, for t = 1, ..., T, by the equations

Observation equation: y

| x

, θ ∼ EF(μ

, φ), (8.1)

Link function: g(μ

) = z



, (8.2)

System equation: x

= G

t−1

+ w

, where w

| θ ∼ N(0, W), (8.3)

where z

is a known vector (possibly including covariates) at time t, x

is a time-dependent

latent state at time t,and θ is a vector of hyperparameters including φ and unknown

components of G

and W. The model is completed with a prior specication for the ini-

tial latent state x

. A Bayesian formulation would also require a prior distribution for the

hyperparameter θ. The above model formulation considers only linear models both at the

link relation and the system evolution levels. A non-Gaussian evolution with nice integra-

tion properties was proposed by Gamerman et al. (2013) to replace (8.3). It includes a few

discrete observational models but is not as general as the above formulation.

Usual features of time series can be represented in the above formulation. For example,

local linear trends are specied with z

= (1, 0)



, G

and x

= (α

, β

).Inthis

case, α represents the local level of the series and β represents the local growth in the series.

Another common feature of time series is seasonality. There are a few related ways to rep-

resent seasonal patterns in time series. Perhaps the simplest representation is the structural

form of Harvey (1989) where the seasonal effects s

are stochastically related via

=−(s

t−1

+ s

t−2

+···+s

t−p+1

) + η

, ∀ t, (8.4)

for seasonal cycles of length p. Deterministic or static seasonal terms are obtained in the

limiting case of η

=0, a.s., thus implying that



t−i

=0, for all p. Evolution 8.3 is

recovered by forming the latent component x

= (s

, s

t−1

, ··· , s

t−p+1

)



with z

= (1, 0



p−1

)



p−1

and w

= (η

p−1

), where I

,and0

are the identity matrix, vector

p−1

of 1s and vector of 0s of order m, respectively.

167 Dynamic Bayesian Models for Discrete-Valued Time Series

By far, the most common discrete-valued specications are the Poisson and binomial

distributions. The Poisson distribution is usually assumed in the analysis of time series of

counts. The most popular model for time series of counts is the log-linear dynamic model

given by

Observation equation: y

| x

∼ Poisson(μ

),for t = 1, ..., T (8.5)

Link function: log(μ

) = z



,for t = 1, ..., T, (8.6)

with system equation (8.3). For binomial-type data, the most popular model is the dynamic

logistic regression given by

Observation equation: y

| x

, θ ∼ Bin(n

, π

),for t = 1, ..., T (8.7)

Link function: logit(π

) = z



,for t = 1, ..., T, (8.8)

with system equation (8.3). Similar models are obtained if the logit link is replaced by the

probit or complementary log–log links.

A number of extensions/variations can be contemplated:

• Nonlinear models can be considered at the link relation (8.2) and/or at the system

evolution (8.3);

• Some components of the latent state x

may be xed over time. The generalized

linear models (GLM) (Nelder and Wedderburn, 1972) are obtained in the static,

limiting case that all components of x

are xed;

• The observational equation (8.1) may be robustied to account for overdispersion

(Gamerman, 1997);

• The link function (8.2) may be generalized to allow for more exible forms via

parametric (Abanto-Valle and Dey, 2014) or nonparametric (Mallick and Gelfand,

1994) mixtures; and

• The system equation disturbances may be generalized by replacement of

Gaussianity by robustied forms (Meinhold and Singpurwalla, 1989) or by skew

forms (Valdebenito et al., 2015).

Data overdispersion is frequently encountered in discrete-valued time series observed in

human-related studies. It can be accommodated in the DGLM formulation (8.1) through

(8.3) via additional random components in the link functions (8.6) and (8.8). These addi-

tional random terms cause extra variability at the observational level, forcing a data

dispersion larger than that prescribed by the canonical model. These terms may be included

in conjugate fashion, thus rendering negative binomial and beta-binomial to replace

Poisson and binomial distributions, respectively leading to hierarchical GLM (Lee and

Nelder, 1999). Alternatively, random terms may be added to the linear predictors z



in the

link equations (Ferreira and Gamerman, 2000). The resulting distributions are also overdis-

persed but no longer available analytically in closed forms. Their main features resemble

those of the corresponding negative binomial and beta-binomial distributions, for N(0, σ

)

random terms.

Inference can be performed in two different ways: sequentially or in a single block.

From a Bayesian perspective, these forms translate into obtaining the sequence of

distributions of [(x

, θ) | y

],for t = 1, ..., T or [(x

, ..., x

, θ) | y

], respectively, where

168 Handbook of Discrete-Valued Time Series

={y

, y

, ..., y

} and y

represents the initial information. The sequential approach is

obtained via iterated use of Bayes’ theorem

p(x

, θ | y

) ∝ p(y

| x

, θ) p(x

| y

t−1

, θ)p(θ | y

t−1

), (8.9)

where the rst term on the right side is given by (8.1). The second term on the right side is

obtained iteratively via





�

   

p x

| y

t−1

, θ = p x

| x

t−1

, y

t−1

, θ p x

t−1

| y

t−1

, θ dx

t−1

, (8.10)

where the rst term in the integrand is given by (8.3).

Single block inference is performed by a single pass of Bayes’ theorem as





T T

p x

, x

, ..., x

, θ | y

∝



p(y

| x

, θ) ×



p(x

| x

t−1

, θ) × p(x

)p(θ), (8.11)

i=1 i=1

where the terms in the products above are respectively given by (8.1) and (8.3).

Inference may also be performed from a classical perspective. In this case, a likelihood

approach would use the above posterior distribution as a penalized likelihood, probably

with removal of the prior p(θ). This route is pursued in Durbin and Koopman (2001) (see

also Davis and Dunsmuir [2015; Chapter 6 in this volume]). This chapter will concentrate

on the Bayesian paradigm.

The resulting distributions are untractable analytically for all cases discussed above and

approximations must be used. The following sections describe some of the techniques that

are currently being used to approximate the required distributions. They are Markov chain

Monte Carlo (MCMC), SMC (or particle lters), and integrated nested Laplace approxima-

tions (INLA). These techniques do not exhaust the range of possibilities for approximations,

but are the most widely used techniques nowadays. Finally, the techniques are applied to

real datasets under a variety of model formulations in order to illustrate the usefulness of

the SSM formulation.

8.2 MCMC

We describe MCMC methods in the context of the general SSM given in (8.1)–(8.3).

A detailed review on this subject can be found in Fearnhead (2011) and Migon et al.

(2005). From the Bayesian perspective, inference in a general SSM targets the joint pos-

terior distribution of parameters and hidden states, p(θ, x

| y

), which is given by (8.11).

A Markov chain whose state is (θ, x

) and whose stationary distribution is the joint pos-

terior distribution p(θ, x

| y

) is subsequent. A realization of this chain is generated until

convergence is reached. After convergence, the following iterations of the chain can be used

to form a sample from the posterior distribution. The Gibbs sampler, iteratively drawing

samples from p(x

| y

, θ) and p(θ | x

, y

) is the most popular method to sample from

such a posterior distribution. In practice, sampling from p(θ | x

, y

) is often easy, whereas

designing a sampler for p(x

| y

, θ) is trickier due to the high posterior correlation that

usually occurs between the states. Next, we will describe approaches to sample from states.



 



169 Dynamic Bayesian Models for Discrete-Valued Time Series

8.2.1 Updating the States

The simplest procedure is to update the components of the states x

oneattimeinasingle-

move fashion (Carlin et al., 1992; Geweke and Tanizaki, 2001). However, due to the severe

correlation between states, such a sampler may lead to slow mixing. In such cases it is better

to update the states in a multimove fashion as blocks of states x

r,s

= (x

, x

r+1

, ..., x

)



,or

update the whole state process x

(Shephard and Pitt, 1997; Carter and Kohn, 1994, 1996).

8.2.1.1 Single-Move Update for the States

Carlin et al. (1992) and Geweke and Tanizaki (2001) introduced the Gibbs sampler and the

Metropolis–Hastings algorithms to perform inference for nonnormal and nonlinear SSM

in a single-move fashion. For sampling states, a sequential sampler that updates each state

conditioning on the rest of the states is used. Such an approach is easy to implement and

due to the Markovian evolution of the states, the conditional distribution of each state given

all the others reduces to conditioning only on its two adjacent states:

p(x

| y

, x

t−1

, x

t+1

, θ) ∝

p(y

| x

, θ)p(x

t+1

| x

, θ)p(x

| x

t−1

, θ) t < T

(8.12)

p(y

| θ, x

)p(x

| x

t−1

, θ) t = T (end point).

In some situations, we can simulate directly from the full conditional distribution, and

such moves will always be accepted. Where this is not possible, a Metropolis–Hastings

step within the Gibbs sampler can often be implemented. Geweke and Tanizaki (2001) give

a detailed discussion of several proposals in this situation.

8.2.1.2 Multimove Update for the States

While single-move samplers are easy to implement, the resulting MCMC algorithms can

mix slowly if there is strong dependence in the state process. Updating the states in a mul-

timove fashion could be an alternative approach to overcome this problem. Ideally, we

would update the whole state process in one move. The simulation smoother of de Jong

and Shephard (1995) can be used to sample the states.

In situations where it is not possible to update the whole state process, Shephard

and Pitt (1997) and Watanabe (2004) propose sampling random blocks of the distur-

bances w

r+1,s

= (w

r+1

, ..., w

)



(equivalently x

r+1,s

using as proposal transition density

a second order Taylor expansion of l

= log p(y

| ζ

) in the full conditional of

t+1,s

, where h(ζ

) = d

= z



. The proposal density is then the multivariate normal

with pseudo-observations

= z



(

) and V

=− l



(

),for t = r + 1, ..., s − 1

and t = T. For t = s < T, we have



(

− l



(



}+ G



−1

and

t+1





−1



t+1

−

t+1

− l



(



. Then the linear SSM with pseudo-observations

dened as



+ ν

t = r + 1, ..., s − 1, and t = T

+ ν

t = s < T

(8.13)

= G

t−1

+ w

, ∀ t,

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 8: Dynamic Bayesian Models for Discrete-Valued Time Series (1/5)

Create new playlist

Sign In

Sign Up

Table of Contents for
8: Dynamic Bayesian Models for Discrete-Valued Time Series (1/5)