170 Handbook of Discrete-Valued Time Series
where ν
t
and w
t
are all independent and ν
t
N (0,
V
ˆ
t
), w
t
N (0, W
t
). Notice that sam-
pling from this distribution (g) is the same as sampling w
r+1,s
given x
r
, x
s+1
and
y
ˆ
r+1
, ...,
y
ˆ
s
in the above model, which is possible by using the de Jong and Shephard (1995) simulation
smoother. Since the distribution f of the disturbances is not bounded by g, the Metropolis–
Hastings acceptance–rejection algorithm samples from f as recommended by Chib and
Greenberg (1995). The expansion blocks
w
ˆ
r+1,s
(x
r+1,s
) are selected as follows. Once an
initial expansion block is selected, the auxiliary observations
y
ˆ
t
are calculated. Next, appli-
cation of the Kalman lter and a disturbance smoother to the linear Gaussian SSM with
the articial
y
ˆ
t
yields the mean of x
r+1,s
conditional on
x
ˆ
r+1,s
. By repeating the procedure
until the smoothed estimates converge, we obtain the posterior mode x
r+1,s
. According to
Shephard and Pitt (1997), the blocks are selected randomly.
Gamerman (1998) suggests the use of a proposal transition density very similar to
that of Shephard and Pitt (1997) based on a reparametrization of the model in terms of
the system disturbances and sampling from these distributions. The proposal density
is the full conditional distribution of w
t
in the model with the modied observational
equation in (8.13). The reparametrization rewrites the link function in term of the sys-
tem disturbances as g(μ
t
) = α
t
= z
t
j=1
G
tj
w
j
with w
t
N (0, W
t
), t = 2, ..., T and
t
w
1
N (a
1
, R
1
),if G
t
= G, t.
8.3 Sequential Monte Carlo
Since the seminal work by Gordon et al. (1993), the SMC methods—also known as parti-
cle lter algorithms—have gained popularity as generalization of the importance sampling
algorithm. The auxiliary particle lter (Pitt and Shephard, 1999) is another generalization
of SMC. Since SMC methods are employed for ltering and smoothing given the parame-
ters, the parameters should rst be estimated before estimating the state vector. Recently,
Andrieu et al. (2010) proposed the particle MCMC algorithm, which combines two algo-
rithms: Metropolis–Hastings and SMC methods. They showed that if the likelihood is unbi-
asedly estimated by SMC and is plugged into a Metropolis–Hastings algorithm, then the
parameters and states can be sampled from the correct posterior distribution. Nonetheless,
designing a Metropolis–Hastings algorithm and tuning it can be cumbersome, especially
for some state space models. It is possible to use adaptive Metropolis–Hastings sampling
schemes as in Pitt et al. (2012) to overcome this problem. The rst scheme is the adap-
tive random walk Metropolis sampling proposed by Roberts and Rosenthal (2009) (see
references therein), and the second one is the adaptive independent Metropolis–Hastings
sampling algorithm proposed by Giordani and Kohn (2010).
We show below the combination of the particle lter proposed by Gordon et al. (1993)
and the adaptive random walk Metropolis sampling proposed by Roberts and Rosenthal
(2009), which enables us to draw from the exact posterior distribution of the parameter
vector, including the states. We should keep in mind that there are other more efcient
combinations of particle ltering and adaptive sampling methods.
8.3.1 Particle Filter
We describe the particle lter by Gordon et al. (1993). Suppose that we have samples
x
k
t1
p(x
t1
|y
t1
, ) for k = 1, ..., K. First, we take the sample
x
k
p(x
t
|x
k
t1
),for
t
171 Dynamic Bayesian Models for Discrete-Valued Time Series
k = 1, ..., K, which gives an approximation of p(x
t
|y
1
, ..., y
t1
, )—the prediction density.
We can compute the corresponding weights and probabilities by
δ
k
δ
k
t
= p
y
t
|
x
k
t
,
and ω
k
t
=
t
1
δ
k
.
K
j=
t


K
The ltering density p(x
t
|y
1
, ..., y
t
, ) can be approximated by
x
t
k
, ω
k
,thatis,
t
k=1
K
p(x
t
|y
t
, ) = ω
k
t
x
t
x
k
, (8.14)
t
k=1
where is the Dirac function.
The next step is to resample from this mass function to obtain an equally weighted
sample, which we call x
k
t
for k =1, ..., K. We can use the multinomial sampling in
this resampling step although stratied sampling reduces the variance of the simulated
likelihood. The true likelihood is in fact estimated by the simulated likelihood that is
given by
p
ˆ
(y
t
|y
1
, ..., y
t1
, ) =
K
1
K
p
y
t
|
x
k
t
=
K
1
K
δ
k
t
. (8.15)
k=1 k=1
We now move to the next time step.
8.3.2 Adaptive Random Walk Metropolis Sampling
The posterior distribution p(|Y) is our target density from which we wish to draw a sam-
ple. However, it is computationally difcult to do so directly and we use the Metropolis–
Hastings algorithm. Given an initial
0
, we then generate
j
for j 1 from the proposal
density g
j
(,
) where
represents previous iteration values of . Dene
p
j
as the
proposed value of
j
generated from g
j
(;
j1
). We then take
j
=
p
j
with probability
p
p
j
p
|Y
g
j
j1
;
j
p
α
j1
;
j
= min
1,
p
j1
|Y
g
j
p
j
;
j1
, (8.16)
and take
j
=
j1
otherwise. Under some regularity conditions (Tierney, 1994), the
sequence {
j
, j = 1, ..., n} converges as n →∞to draws from the target density p(|Y).
The adaptive random walk Metropolis proposal of Roberts and Rosenthal (2009) is
g
j
(;
j1
) = γ
j
φ
d
(|
j1
, η
1
1
) + (1 γ
j
)φ
d
(|
j1
, η
2
2j
), (8.17)
where d is the dimension of and φ
d
(x|μ, ) is a multivariate d dimensional normal den-
sity with mean μ and covariance matrix . In (8.17), γ
j
= 1forj k,withk 1 representing
the initial iterations, γ
j
= 0.05 for j > k; η
1
= 0.1
2
/d, which makes the sampler to move
172 Handbook of Discrete-Valued Time Series
locally in small steps; η
2
= 2.38
2
/d, which is optimal (Roberts et al., 1997) when the poste-
rior distribution is a multivariate normal;
1
is a constant covariance matrix that may be
derived from an estimate of the parameters or may simply be the identity matrix;
2j
is the
sample covariance matrix of the rst j 1 iterates (the adaptive step).
On one hand the posterior distribution is considered to be approximated by
p
ˆ
(|Y)
L
ˆ
(Y; )p(). On the other hand, we can take all the random variables used in the particle
lter as a vector u of uniforms and the posterior distribution in an augmented space as
p(, u|Y) L(Y; , u)p()p(u). We can plug this into a Metropolis–Hastings scheme such
as the adaptive random walk Metropolis to draw a sample from p(|Y) L(Y; )p().
See Andrieu et al. (2010) and Pitt et al. (2012) for more details. There are efcient ways to
learning sequentially about , if conditional sufcient statistics exist (Carvalho et al., 2010).
8.4 INLA
The INLA, hereafter denoted by INLA, is a deterministic approach developed by Rue et al.
(2009) to perform approximate Bayesian inference in the wide class of latent Gaussian
models (Rue and Held, 2005). The INLA methodology takes advantage of the hierarchical
structure of latent Gaussian models and combines a series of deterministic approximations
to obtain posterior marginals of the unknown parameters in the model as well as other
summary, predictive, and validation measures of interest.
As mentioned in Martins et al. (2013), implementation of the INLA methodology
requires some expertise in numerical methods and computer programming, in order to
achieve efcient computing times. The R (R Core Team, 2013) package INLA, hereafter
denoted as R-INLA, was developed to overcome this challenge, and provides a user-
friendly interface to the INLA methodology, allowing it to be used routinely, even for those
who are not interested in the implementation details behind the program.
The models dened by (8.1)–(8.3) belong to the class of latent Gaussian models for which
INLA was originally designed. We refer to Rue et al. (2009) for examples, which are out-
side this scope. Looking at (8.1 and 8.2), we see that π(y
i
|η
i
(x), θ
1
) = EF(g
1
(z
i
x
i
), φ),
i = 1, ..., T with θ
1
= φ. From (8.3) we can write
T
p(x|θ
2
) = p(x
0
|θ
2
) p(x
i
|x
i1
, θ
2
),
i=1
where x
0
|θ
2
N(0, W
0
) and x
i
|x
i1
, θ
2
N(G
i
x
i1
, W). Since x
0
|θ
2
and x
i
|x
i1
, θ
2
, i =
1, ..., T are Gaussian, it can be shown that x|θ
2
N(0, Q
1
(θ
2
)), with precision matrix
given by
W
0
1
G
1
T
W
1
0 ... 0 0
W
1
G
1
W
1
+ G
2
T
W
1
G
2
G
2
T
W
1
... 0 0
.
.
0 W
1
G
2
W
1
+ G
T
3
W
1
G
3
.
0 0
Q(θ
2
) =
.
.
.
0 0
.
.
.
.
.
.
0
0 0 ... W
1
G
T1
W
1
+ G
T
W
1
G
T
G
T
T
W
1
0 0 0 ... W
1
G
T
W
1
173 Dynamic Bayesian Models for Discrete-Valued Time Series
and θ
2
consists of the unknown parameters within the variance–covariance matrix W and
the matrices G
i
, i = 1, ..., T. Therefore, x|θ
2
is a latent Gaussian model with a sparse pre-
cision matrix Q(θ
2
), also known as a Gaussian Markov Random Field (GMRF) (Rue and
Held, 2005).
Since the dynamic models of interest in this chapter can be written as
p(y, x, θ) = p(y|x, θ)p(x|θ)p(θ),
we will describe the INLA methodology in Section 8.4.1, and illustrate how to use the
R-INLA package through examples in Section 8.4.2.
8.4.1 INLA Methodology
For the hierarchical model described earlier, the joint posterior distribution is given by
n
d
p(x, θ|y) p(θ)p(x|θ)
i=1
p(y
i
|η
i
(x), θ)
1
n
d
p(θ)|Q(θ)|
n/2
T
Q(θ)x + log{p(y
i
|x
i
, θ)}exp
x
,
2
i=1
and the posterior marginals of interest can be written as
p(x
i
|y) = p(x
i
|θ, y)p(θ|y)dθ, i = 1, ..., n, (8.18)
p(θ
j
|y) = p(θ|y)dθ
j
, j = 1, ..., m, (8.19)
INLA provides approximations p
˜
(θ|y), p
˜
(x
i
|θ, y) to p(θ|y),and p(x
i
|θ, y), plugs them into
(8.18) and (8.19), and uses numerical integration to obtain the approximated posterior
marginals p
˜
(x
i
|y), p
˜
(θ
j
|y) of interest.
The approximation used for the joint posterior of the hyperparameters p(θ|y) is
p(x, θ, y)
p
˜
(θ|y)
p
G
(x|θ, y)
, (8.20)
x=x
(θ)
174 Handbook of Discrete-Valued Time Series
where p
G
(x|θ, y) is a Gaussian approximation to the full conditional of x, p(x|θ, y), obtained
by matching the modal conguration and the curvature at the mode, and x
(θ) is the mode
of the full conditional for x,foragiven θ. Expression (8.20) is equivalent to the Laplace
approximation of a marginal posterior distribution (Tierney and Kadane, 1986), and it is
exact if p(x|y, θ) is Gaussian.
For p(x
i
|θ, y), three options are available, and they vary in terms of speed and accu-
racy. The fastest option, p
G
(x
i
|θ, y), is to use the marginals of the Gaussian approximation
p
G
(x|θ, y), which is already computed when evaluating expression (8.20). The only extra
cost in obtaining p
G
(x
i
|θ, y) is to compute the marginal variances from the sparse precision
matrix of p
G
(x|θ, y), see Rue et al. (2009) for details. The Gaussian approximation often
gives reasonable results, but it may contain errors in the location and/or errors due to its
lack of skewness (Rue and Martino, 2007). The more accurate approach would be to again
use a Laplace approximation, denoted by p
LA
(x
i
|θ, y), with a form similar to (8.20), that is,
p(x, θ, y)
p
LA
(x
i
|θ, y)
p
GG
(x
i
|x
i
, θ, y)
, (8.21)
x
i
=x
i
(x
i
,θ)
where x
i
represents the vector x with its ith element excluded and p
GG
(x
i
|x
i
, θ, y) is the
Gaussian approximation to x
i
|x
i
, θ, y and x
i
(x
i
, θ) is the modal conguration. A third
option p
SLA
(x
i
|θ, y), called simplied Laplace approximation, is obtained by doing a Taylor
expansion on the numerator and denominator of (8.21) up to third order, thus correcting
the Gaussian approximation for location and skewness with a much lower cost when com-
pared to p
LA
(x
i
|θ, y). We refer to Rue et al. (2009) for a detailed description of the Gaussian,
Laplace and simplied Laplace, approximations to p(x
i
|θ, y).
Finally, once we have the approximations
p
˜
(θ|y),
p
˜
(x
i
|θ, y) described earlier, the inte-
grals in (8.18) and (8.19) are numerically approximated by discretizing the θ space
through a grid exploration of
p
˜
(θ|y). Details about this grid exploration can be found in
Martins et al. (2013).
8.4.2 R-INLA through Examples
The syntax for the R-INLA package is based on the built-in glm function in R,andabasic
call starts with
formula = y ˜ a + b + a:b + c*d + f(idx1, model1, ...)
+ f(idx2, model2, ...),
where formula describes the structured additive linear predictor η(x). Here, y is the
response variable, the term a + b + a:b + c*d holds similar meaning as in the builtin
glm function in R and is then responsible for the xed effects specication. The f() terms
specify the general Gaussian random effects components of the model. In this case we say
that both idx1 and idx2 are latent building blocks that are combined together to form a
joint latent Gaussian model of interest. Once the linear predictor is specied, a basic call to
t the model with R-INLA takes the following form:
result = inla(formula, data = data.frame(y, a, b, c, d, idx1, idx2),
family = "gaussian").
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset