8: Dynamic Bayesian Models for Discrete-Valued Time Series (2/5)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

170 Handbook of Discrete-Valued Time Series

where ν

and w

are all independent and ν

∼ N (0,

), w

∼ N (0, W

). Notice that sam-

pling from this distribution (g) is the same as sampling w

r+1,s

given x

, x

s+1

and

r+1

, ...,

in the above model, which is possible by using the de Jong and Shephard (1995) simulation

smoother. Since the distribution f of the disturbances is not bounded by g, the Metropolis–

Hastings acceptance–rejection algorithm samples from f as recommended by Chib and

Greenberg (1995). The expansion blocks

r+1,s

) are selected as follows. Once an

initial expansion block is selected, the auxiliary observations

are calculated. Next, appli-

cation of the Kalman lter and a disturbance smoother to the linear Gaussian SSM with

the articial

yields the mean of x

r+1,s

conditional on

r+1,s

. By repeating the procedure

until the smoothed estimates converge, we obtain the posterior mode x

r+1,s

. According to

Shephard and Pitt (1997), the blocks are selected randomly.

Gamerman (1998) suggests the use of a proposal transition density very similar to

that of Shephard and Pitt (1997) based on a reparametrization of the model in terms of

the system disturbances and sampling from these distributions. The proposal density

is the full conditional distribution of w

in the model with the modied observational

equation in (8.13). The reparametrization rewrites the link function in term of the sys-

tem disturbances as g(μ

) = α

= z





j=1

t−j

with w

∼ N (0, W

), t = 2, ..., T and

∼ N (a

, R

),if G

= G, ∀t.

8.3 Sequential Monte Carlo

Since the seminal work by Gordon et al. (1993), the SMC methods—also known as parti-

cle lter algorithms—have gained popularity as generalization of the importance sampling

algorithm. The auxiliary particle lter (Pitt and Shephard, 1999) is another generalization

of SMC. Since SMC methods are employed for ltering and smoothing given the parame-

ters, the parameters should rst be estimated before estimating the state vector. Recently,

Andrieu et al. (2010) proposed the particle MCMC algorithm, which combines two algo-

rithms: Metropolis–Hastings and SMC methods. They showed that if the likelihood is unbi-

asedly estimated by SMC and is plugged into a Metropolis–Hastings algorithm, then the

parameters and states can be sampled from the correct posterior distribution. Nonetheless,

designing a Metropolis–Hastings algorithm and tuning it can be cumbersome, especially

for some state space models. It is possible to use adaptive Metropolis–Hastings sampling

schemes as in Pitt et al. (2012) to overcome this problem. The rst scheme is the adap-

tive random walk Metropolis sampling proposed by Roberts and Rosenthal (2009) (see

references therein), and the second one is the adaptive independent Metropolis–Hastings

sampling algorithm proposed by Giordani and Kohn (2010).

We show below the combination of the particle lter proposed by Gordon et al. (1993)

and the adaptive random walk Metropolis sampling proposed by Roberts and Rosenthal

(2009), which enables us to draw from the exact posterior distribution of the parameter

vector, including the states. We should keep in mind that there are other more efcient

combinations of particle ltering and adaptive sampling methods.

8.3.1 Particle Filter

We describe the particle lter by Gordon et al. (1993). Suppose that we have samples

t−1

∼ p(x

t−1

, ) for k = 1, ..., K. First, we take the sample



∼ p(x

t−1

),for



 









 

171 Dynamic Bayesian Models for Discrete-Valued Time Series

k = 1, ..., K, which gives an approximation of p(x

, ..., y

t−1

, )—the prediction density.

We can compute the corresponding weights and probabilities by

 

= p



, 

and ω







The ltering density p(x

, ..., y

, ) can be approximated by



, ω

,thatis,

k=1







p(x

, ) = ω



−



, (8.14)

k=1

where  is the Dirac function.

The next step is to resample from this mass function to obtain an equally weighted

sample, which we call x

for k =1, ..., K. We can use the multinomial sampling in

this resampling step although stratied sampling reduces the variance of the simulated

likelihood. The true likelihood is in fact estimated by the simulated likelihood that is

given by

, ..., y

t−1

, ) =



. (8.15)

k=1 k=1

We now move to the next time step.

8.3.2 Adaptive Random Walk Metropolis Sampling

The posterior distribution p(|Y) is our target density from which we wish to draw a sam-

ple. However, it is computationally difcult to do so directly and we use the Metropolis–

Hastings algorithm. Given an initial 

, we then generate 

for j  1 from the proposal

density g

(, 



) where 



represents previous iteration values of . Dene 

as the

proposed value of 

generated from g

(; 

j−1

). We then take 

= 

with probability













j−1

; 





j−1

; 

= min







j−1







; 

j−1





, (8.16)

and take 

= 

j−1

otherwise. Under some regularity conditions (Tierney, 1994), the

sequence {

, j = 1, ..., n} converges as n →∞to draws from the target density p(|Y).

The adaptive random walk Metropolis proposal of Roberts and Rosenthal (2009) is

(; 

j−1

) = γ

(|

j−1

, η



) + (1 − γ

)φ

(|

j−1

, η



), (8.17)

where d is the dimension of  and φ

(x|μ, ) is a multivariate d dimensional normal den-

sity with mean μ and covariance matrix . In (8.17), γ

= 1forj  k,withk  1 representing

the initial iterations, γ

= 0.05 for j > k; η

= 0.1

/d, which makes the sampler to move



172 Handbook of Discrete-Valued Time Series

locally in small steps; η

= 2.38

/d, which is optimal (Roberts et al., 1997) when the poste-

rior distribution is a multivariate normal; 

is a constant covariance matrix that may be

derived from an estimate of the parameters or may simply be the identity matrix; 

is the

sample covariance matrix of the rst j − 1 iterates (the adaptive step).

On one hand the posterior distribution is considered to be approximated by

(|Y) ∝

(Y; )p(). On the other hand, we can take all the random variables used in the particle

lter as a vector u of uniforms and the posterior distribution in an augmented space as

p(, u|Y) ∝ L(Y; , u)p()p(u). We can plug this into a Metropolis–Hastings scheme such

as the adaptive random walk Metropolis to draw a sample from p(|Y) ∝ L(Y; )p().

See Andrieu et al. (2010) and Pitt et al. (2012) for more details. There are efcient ways to

learning sequentially about , if conditional sufcient statistics exist (Carvalho et al., 2010).

8.4 INLA

The INLA, hereafter denoted by INLA, is a deterministic approach developed by Rue et al.

(2009) to perform approximate Bayesian inference in the wide class of latent Gaussian

models (Rue and Held, 2005). The INLA methodology takes advantage of the hierarchical

structure of latent Gaussian models and combines a series of deterministic approximations

to obtain posterior marginals of the unknown parameters in the model as well as other

summary, predictive, and validation measures of interest.

As mentioned in Martins et al. (2013), implementation of the INLA methodology

requires some expertise in numerical methods and computer programming, in order to

achieve efcient computing times. The R (R Core Team, 2013) package INLA, hereafter

denoted as R-INLA, was developed to overcome this challenge, and provides a user-

friendly interface to the INLA methodology, allowing it to be used routinely, even for those

who are not interested in the implementation details behind the program.

The models dened by (8.1)–(8.3) belong to the class of latent Gaussian models for which

INLA was originally designed. We refer to Rue et al. (2009) for examples, which are out-

side this scope. Looking at (8.1 and 8.2), we see that π(y

|η

(x), θ

) = EF(g

−1



), φ),

i = 1, ..., T with θ

= φ. From (8.3) we can write

p(x|θ

) = p(x

|θ

) p(x

i−1

, θ

i=1

where x

|θ

∼ N(0, W

) and x

i−1

, θ

∼ N(G

i−1

, W). Since x

|θ

and x

i−1

, θ

, i =

1, ..., T are Gaussian, it can be shown that x|θ

∼ N(0, Q

−1

(θ

)), with precision matrix

given by





−1

0 ... 0 0



−1

+ G

−1

... 0 0

















0 W

−1

+ G

−1

0 0







Q(θ

) =











0 0









0 0 ... W

−1

T−1

−1

+ G

−1



0 0 0 ... W

−1

�



173 Dynamic Bayesian Models for Discrete-Valued Time Series

and θ

consists of the unknown parameters within the variance–covariance matrix W and

the matrices G

, i = 1, ..., T. Therefore, x|θ

is a latent Gaussian model with a sparse pre-

cision matrix Q(θ

), also known as a Gaussian Markov Random Field (GMRF) (Rue and

Held, 2005).

Since the dynamic models of interest in this chapter can be written as

p(y, x, θ) = p(y|x, θ)p(x|θ)p(θ),

we will describe the INLA methodology in Section 8.4.1, and illustrate how to use the

R-INLA package through examples in Section 8.4.2.

8.4.1 INLA Methodology

For the hierarchical model described earlier, the joint posterior distribution is given by



p(x, θ|y) ∝ p(θ)p(x|θ)

i=1

p(y

|η

(x), θ)







∝ p(θ)|Q(θ)|

n/2

−

Q(θ)x + log{p(y

, θ)}exp

i=1

and the posterior marginals of interest can be written as

p(x

|y) = p(x

|θ, y)p(θ|y)dθ, i = 1, ..., n, (8.18)

p(θ

|y) = p(θ|y)dθ

−j

, j = 1, ..., m, (8.19)

INLA provides approximations p

(θ|y), p

|θ, y) to p(θ|y),and p(x

|θ, y), plugs them into

(8.18) and (8.19), and uses numerical integration to obtain the approximated posterior

marginals p

|y), p

(θ

|y) of interest.

The approximation used for the joint posterior of the hyperparameters p(θ|y) is

p(x, θ, y)



(θ|y) ∝

(x|θ, y)

, (8.20)

x=x

∗

(θ)



174 Handbook of Discrete-Valued Time Series

where p

(x|θ, y) is a Gaussian approximation to the full conditional of x, p(x|θ, y), obtained

by matching the modal conguration and the curvature at the mode, and x

∗

(θ) is the mode

of the full conditional for x,foragiven θ. Expression (8.20) is equivalent to the Laplace

approximation of a marginal posterior distribution (Tierney and Kadane, 1986), and it is

exact if p(x|y, θ) is Gaussian.

For p(x

|θ, y), three options are available, and they vary in terms of speed and accu-

racy. The fastest option, p

|θ, y), is to use the marginals of the Gaussian approximation

(x|θ, y), which is already computed when evaluating expression (8.20). The only extra

cost in obtaining p

|θ, y) is to compute the marginal variances from the sparse precision

matrix of p

(x|θ, y), see Rue et al. (2009) for details. The Gaussian approximation often

gives reasonable results, but it may contain errors in the location and/or errors due to its

lack of skewness (Rue and Martino, 2007). The more accurate approach would be to again

use a Laplace approximation, denoted by p

|θ, y), with a form similar to (8.20), that is,

p(x, θ, y)



|θ, y) ∝

−i

, θ, y)

∗

, (8.21)

−i

,θ)

where x

−i

represents the vector x with its ith element excluded and p

−i

, θ, y) is the

Gaussian approximation to x

−i

, θ, y and x

−

∗

, θ) is the modal conguration. A third

option p

SLA

|θ, y), called simplied Laplace approximation, is obtained by doing a Taylor

expansion on the numerator and denominator of (8.21) up to third order, thus correcting

the Gaussian approximation for location and skewness with a much lower cost when com-

pared to p

|θ, y). We refer to Rue et al. (2009) for a detailed description of the Gaussian,

Laplace and simplied Laplace, approximations to p(x

|θ, y).

Finally, once we have the approximations

(θ|y),

|θ, y) described earlier, the inte-

grals in (8.18) and (8.19) are numerically approximated by discretizing the θ space

through a grid exploration of

(θ|y). Details about this grid exploration can be found in

Martins et al. (2013).

8.4.2 R-INLA through Examples

The syntax for the R-INLA package is based on the built-in glm function in R,andabasic

call starts with

formula = y ˜ a + b + a:b + c*d + f(idx1, model1, ...)

+ f(idx2, model2, ...),

where formula describes the structured additive linear predictor η(x). Here, y is the

response variable, the term a + b + a:b + c*d holds similar meaning as in the builtin

glm function in R and is then responsible for the xed effects specication. The f() terms

specify the general Gaussian random effects components of the model. In this case we say

that both idx1 and idx2 are latent building blocks that are combined together to form a

joint latent Gaussian model of interest. Once the linear predictor is specied, a basic call to

t the model with R-INLA takes the following form:

result = inla(formula, data = data.frame(y, a, b, c, d, idx1, idx2),

family = "gaussian").

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 8: Dynamic Bayesian Models for Discrete-Valued Time Series (2/5)

Create new playlist

Sign In

Sign Up

Table of Contents for
8: Dynamic Bayesian Models for Discrete-Valued Time Series (2/5)