372 Handbook of Discrete-Valued Time Series
Let Y
t
= (Y
1,t
, ..., Y
n,t
)
denote the binary responses at all sites and a given time point
t for t = 1, ..., T and a total of T sampling time points. Then, the joint distribution of
Y
2
, ..., Y
T1
conditional on Y
1
and Y
T
is
p(y
2
, ..., y
T1
|y
1
, y
T
; θ)
T1 n
p
n T
n
=
c(θ)
1
exp
θ
k
x
k,i,t
y
i,t
+
1
θ
p+1
y
i,t
y
j,t
+
θ
p+2
y
i,t
y
i,t1
,
2
t=2
i=1 k=0 i=1 jN
i
t=2
i=1
(17.16)
where c(θ) is a normalizing constant and generally is intractable as it does not have an
analytical form.
The full conditional distribution (17.15) is symmetric in time and thus depends on both
past and future time points. For prediction at future time points, however, it would be
more sensible to have the conditional distributions depend only on the past. For example,
Zhu et al. (2008) proposed the following conditional distributions:
p(y
i,t
|y
j,t
: j = i, y
t
: t
= t 1, t 2, ...)
= p(y
i,t
|y
j,t
: j N
i
, y
t
: t
= t 1, t 2, ..., t S)
exp
p
k=0
θ
k
x
k,i,t
y
i,t
+
jN
i
θ
p+1
y
i,t
y
j,t
+
S
s=1
θ
p+1+s
y
i,t
y
i,ts
=
, (17.17)
1 + exp
k
p
=0
θ
k
x
k,i,t
+
jN
i
θ
p+1
y
j,t
+
s
S
=1
θ
p+1+s
y
i,ts
where i = 1, ..., n, t = S+1, ..., T,andS is the maximum temporal lag. The term in (17.17) is
a full conditional distribution for a given time point t, even though it is not a full conditional
distribution for all i and t. The spatial neighborhood N
i
may be further partitioned into
different orders of neighborhood. In particular, let N
i
=
l
L
=1
N
i
(l)
, where N
i
(l)
denotes the
lth-order neighborhood that comprises the lth nearest neighbors for l = 1, ..., L. Similar
to the model specied via (17.16), the transition probability p(y
t
|y
t
: t
= t 1, ..., t S)
and the subsequent joint distribution function can be obtained. For ease of presentation,
we focus on (17.16).
17.2.2 Statistical Inference
The intractable normalizing constant in the joint distribution function poses challenges in
the statistical inference for the autologistic model with or without regression, an area of
active research in the last couple of decades. While Besag (1975) originally proposed max-
imum pseudo-likelihood estimates (MPLEs), Huffer and Wu (1998) used Markov chain
Monte Carlo (MCMC) methods to approximate the unknown normalizing constant and
developed Monte Carlo maximum likelihood estimates (MCMLE) for spatial autologis-
tic models. Further, Huang and Ogata (2002) generalized the pseudo-likelihood function
and showed better performance of the resulting estimates than MPLE in terms of standard
errors and efciency relative to maximum likelihood estimates (MLEs). Berthelsen and
Møller (2003) developed path sampling to approximate the ratio of unknown normalizing
373 Autologistic Regression Models for Spatio-Temporal Binary Data
constants in spatial point processes, which Zheng and Zhu (2008) used for computing the
MCMLE. Friel et al. (2009) proposed a fast computation method for the estimation of the
normalizing constant based on a reduced dependence approximation of the likelihood
function. Later, we describe statistical inference based on MPLE, MCMLE, and Bayesian
hierarchical modeling.
17.2.2.1 Maximum Pseudo-Likelihood Estimation
Maximum pseudo-likelihood, rst introduced by Besag (1975) for autologistic models, is a
popular approach to the statistical inference for autologistic regression models. The MPLE
is the value of θ that maximizes the product of the full conditional distributions,
θ
˜
= arg max
θ
L
PL
(Y; θ),
where the pseudo-likelihood function for a spatio-temporal autologistic model is
L
PL
(Y; θ) = p(y
i,t
|y
i
,t
: (i
, t
) = (i, t))
i,t
exp{
k
p
=0
θ
k
x
k,i,t
y
i,t
+
jN
i
θ
p+1
y
i,t
y
j,t
+ θ
p+2
y
i,t
(y
i,t1
+ y
i,t+1
)}
=
p
.
i,t
1 + exp{
k=0
θ
k
x
k,i,t
+
jN
i
θ
p+1
y
j,t
+ θ
p+2
(y
i,t1
+ y
i,t+1
)}
(17.18)
Although the pseudo-likelihood function (17.18) is not the true likelihood except in the
trivial case of spatio-temporal independence, it can be shown that MPLEs are consistent
and asymptotically normal under suitable regularity conditions (Guyon, 1995).
To maximize the pseudo-likelihood function and obtain the MPLE of θ, it is straightfor-
ward to apply the standard logistic regression that assumes independence, which can be
implemented by, for example, proc logistic in SAS or the function glm in R. The corre-
sponding standard errors and approximate condence intervals can be obtained by a para-
metric bootstrap. Specically, in the parametric bootstrap, M resamples of spatio-temporal
binary responses are drawn according to the spatio-temporal autologistic regression model
using Gibbs sampling or perfect sampling. For each resample, an MPLE is computed and
the M resampled MPLEs are used to obtain an estimate of the variance of the MPLE based
on the original data. In particular, perfect sampling uses coupling and upon coalescence of
the coupled Markov chains, the resulting Monte Carlo samples are guaranteed to be from
the target distribution (e.g., Propp and Wilson, 1996; Møller, 1999).
17.2.2.2 Monte Carlo Maximum Likelihood Estimation
The maximum pseudo-likelihood approach is computationally efcient, but is statisti-
cally less efcient than maximum likelihood (Gumpertz et al., 1997; Wu and Huffer, 1997;
Zheng and Zhu, 2008). An alternative approach is Monte Carlo maximum likelihood
(MCML), where the normalizing constant is approximated using MCMC and thus direct
maximization of likelihood function can be obtained.
374 Handbook of Discrete-Valued Time Series
The likelihood function can be rewritten as
L(Y; θ) = p(y
2
, ..., y
T1
|y
1
, y
T
; θ) = c(θ)
1
exp(θ
z),
where
1
z =
y
i,t
, x
1,i,t
y
i,t
, ..., x
p,i,t
y
i,t
,
2
y
i,t
y
i
,t
, θ
p+1
y
i,t
y
i,t1
.
i,t i,t i,t i,t i
N
i
i,t
Based on a preselected parameter vector ψ = (ψ
0
, ..., ψ
p+2
)
, approximate the ratio of two
normalizing constants via importance sampling by
M M
c(θ)
= E
ψ
exp(θ
z)
M
1
exp(θ
z
m
)
= M
1
exp{(θ ψ)
z
m
},
c(ψ)
exp(ψ
z) exp(ψ
z
m
)
m=1 m=1
where z
m
is z evaluated at the mth Monte Carlo sample of Y for m = 1, ..., M.MonteCarlo
samples of Y are generated from the joint distribution evaluated at ψ. Then the MLE can
be approximated by maximizing a rescaled version of the likelihood function
1
M
c(ψ)L(Y; θ) =
c(ψ)
exp(θ
z) =
M
1
exp
(θ ψ)
z
m
exp(θ
z).
c(θ)
m=1
The variances of the estimates can be estimated by using the diagonal elements of the
inverse of the observed Fisher information matrix (Huffer and Wu, 1998; Geyer, 1994).
The MCMLE provides a good approximation of the MLE of the model parameters when
the reference parameter ψ is close to the truth (Geyer and Thompson, 1992). The MPLE is a
natural choice for the reference parameter. However, when the spatial or temporal depen-
dence is strong, MPLE can be far away from the MLE, whereas MCMLE with MPLE as the
reference parameter may not exist and the iteration may lead to a sequence of estimates that
drift off to innity. In this case, we select ψ to be an approximation obtained by a stochastic
approximation algorithm. This is a two-stage MCMC stochastic approximation algorithm
proposed by Gu and Zhu (2001) for computing the MLEs of model parameters for a class of
spatial models. In the rst stage, the estimates are moved into a feasible region quickly by
using large gain constants in the stochastic approximation and in the second stage, an opti-
mal procedure is implemented with a stopping criterion chosen so that a desired precision
can be obtained. By the rst stage of the algorithm, ψ can be obtained.
17.2.3 Bayesian Inference
Bayesian hierarchical modeling can be applied for the inference about spatio-temporal
autologistic regression models. Møller et al. (2006) presented an auxiliary variable MCMC
algorithm that allows the construction of a proposal distribution so that the normalizing
constants cancel out in the Metropolis–Hastings (MH) ratio. Zheng and Zhu (2008) pro-
posed a Bayesian approach for both model parameter inference and prediction at future
375 Autologistic Regression Models for Spatio-Temporal Binary Data
time points using MCMC. They proposed an MH algorithm to generate Monte Carlo sam-
ples from the posterior distribution of the parameter θ, where the likelihood ratio in the
acceptance probability is approximated by
p(y
2
, ..., y
T1
|y
1
, y
T
; θ
)
exp{θ
z}
c(θ)
c(ψ)
p(y
2
, ..., y
T1
|y
1
, y
T
; θ)
=
exp{θ
z}
×
c(θ
)
c(ψ)
M
exp{(θ
θ)
z
m=1
exp{(θ ψ)
z
m
}
.
M
1
exp{(θ
ψ)
z
m
}
m=
Here, M Monte Carlo samples of Y need to be generated from the joint distribution p(y
2
, ...,
y
T1
|y
1
, y
T
, ψ) evaluated at ψ, but only once at the beginning of the MH algorithm, which
makes the algorithm efcient. For the MH algorithm, a good choice of the parameter vector
ψ helps to speed up the convergence process. The closer ψ is to the posterior mode of θ,the
better the results are. Further, the variance of the proposal distribution needs to be adjusted
to ensure a reasonable acceptance probability in the MH algorithm (Gelman et al., 2003).
Path sampling is an alternative way to calculate the ratio of two normalizing constants
and is based on the following identity:
1
ln
c
c
(
(
θ
0)
)
=
E
θ(s)
ds
d
θ(s)
z
ds
0
where the expectation is with respect to the joint distribution evaluated at the parame-
ter θ(s) along a path of θ(s) =sθ for s ∈[0, 1] from 0 to θ. However, the computation
can be costly because multiple Monte Carlo samples of Y are required for computing the
expectation.
For the spatio-temporal autologistic regression model, Zheng and Zhu (2008) com-
pared the performance of MPL, MCML, and Bayesian inference. They demonstrated that
parameter inference via MPL can be statistically inefcient when spatial and/or temporal
dependence is strong, whereas the statistical properties of the MCML are comparable to
the Bayesian approach and the computation of MCML estimates is faster. Further, using
Bayesian inference, the posterior distribution of the model parameters can be obtained and
it becomes straightforward to construct credible bands at desired levels.
17.2.4 Prediction
Let Y
˜
= (Y
T+1
, ..., Y
T+T
)
denote the responses at future time points T +1, ..., T +T
with
T
1. For prediction of
Y
˜
based on model parameter estimates from MPL and MCML, a
Gibbs sampler can be used to obtain the Monte Carlo samples of
Y
˜
from
p(y˜|y
T
, y
T+T
+1
; θ)
T+T
n
p
n T+T
+1
n
1
exp
θ
k
x
k,i,t
y
i,t
+ θ
p+1
y
i,t
y
j,t
+
θ
p+2
y
i,t
y
i,t1
.
2
t=T+1
i=1 k=0 i=1 jN
i
t=T+1
i=1
,
376 Handbook of Discrete-Valued Time Series
For prediction of
Y
˜
in the Bayesian framework, the posterior predictive distribution of
Y
˜
is
p(y˜|y, y
T+T
+1
) =
p(y˜|y, y
T+T
+1
; θ)p(θ|y)dθ.
To draw Monte Carlo samples of
Y
˜
from p(y˜|y, y
T+T
+1
), rst draw θ from its posterior
distribution p(θ|y) and then for each given θ,draw
Y
˜
from p(y˜|y, y
T+T
+1
; θ) using a Gibbs
sampler (Zheng and Zhu, 2008).
17.3 Centered Autologistic Regression Model
In the aforementioned autologistic regression models, the interpretation of model
parameters is not straightforward (Caragea and Kaiser, 2009; Kaiser and Caregea, 2009).
In the presence of positive spatial and temporal dependence, under the uncentered
parameterization, the conditional expectation of Y
i,t
given its neighbors is
E(Y
i,t
|Y
i
,t
= y
i
,t
: (i
, t
) N
i,t
)
exp
k
p
=0
θ
k
x
k,i,t
+
jN
i
θ
p+1
y
j,t
+ θ
p+2
(y
i,t1
+ y
i,t+1
)
=
. (17.19)
1 + exp
k
p
=0
θ
k
x
k,i,t
+
jN
i
θ
p+1
y
j,t
+ θ
p+2
(y
i,t1
+ y
i,t+1
)
The expectation (17.20) is larger than the expectation of Y
i,t
under independence,
p
exp
k=0
θ
k
x
k,i,t
1 + exp
k
p
=0
θ
k
x
k,i,t
as long as Y
i,t
has nonzero spatial and/or temporal neighbors, but is never smaller. This
may not be reasonable when most neighbors are zeros and thus can bias the realizations
toward 1. Hence, the interpretation of dependence parameters is difcult. Further, the
marginal expectation of Y
i,t
(i.e., E(Y
i,t
|x
k,i,t
, k = 1, ..., p)) is greater than the expectation
of Y
i,t
under independence. A simulation study in Wang (2013) showed that E(Y
i,t
|x
k,i,t
, k =
1, ..., p) varies across different levels of spatial and temporal dependence for xed regres-
sion coefcients. These make the interpretation of regression coefcients unclear since these
coefcients are to reect the effects of covariates and should have a consistent interpretation
across varying dependence levels.
For non-Gaussian Markov random eld models of spatial lattice data, the idea of cen-
tered parameterization was rst proposed by Kaiser and Cressie (1997) for a Winsorized
Poisson conditional model. More recently, Kaiser and Caregea (2009) explored the cen-
tered parameterization for a general exponential family of Markov random eld models.
In particular, Caragea and Kaiser (2009) studied the centered parameterization for spatial
autologistic regression models and showed that the centered parameterization overcomes
the interpretation difculties. Wang and Zheng (2013) extended this work to the case of
spatio-temporal autologistic regression models.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset