17: Autologistic Regression Models for Spatio-Temporal Binary Data (2/4)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google





372 Handbook of Discrete-Valued Time Series

Let Y

= (Y

1,t

, ..., Y

n,t

)



denote the binary responses at all sites and a given time point

t for t = 1, ..., T and a total of T sampling time points. Then, the joint distribution of

, ..., Y

T−1

conditional on Y

and Y

p(y

, ..., y

T−1

, y

; θ)





 



T−1 n

n T



c(θ)

−1

exp

 

k,i,t

i,t



p+1

i,t

j,t



p+2

i,t

i,t−1

 





t=2

i=1 k=0 i=1 j∈N

t=2

i=1

(17.16)

where c(θ) is a normalizing constant and generally is intractable as it does not have an

analytical form.

The full conditional distribution (17.15) is symmetric in time and thus depends on both

past and future time points. For prediction at future time points, however, it would be

more sensible to have the conditional distributions depend only on the past. For example,

Zhu et al. (2008) proposed the following conditional distributions:

p(y

i,t

j,t

: j = i, y



: t



= t − 1, t − 2, ...)

= p(y

i,t

j,t

: j ∈ N

, y



: t



= t − 1, t − 2, ..., t − S)

exp



k=0

k,i,t

i,t



j∈N

p+1

i,t

j,t



s=1

p+1+s

i,t

i,t−s





, (17.17)

1 + exp



k,i,t



j∈N

p+1

j,t



p+1+s

i,t−s

where i = 1, ..., n, t = S+1, ..., T,andS is the maximum temporal lag. The term in (17.17) is

a full conditional distribution for a given time point t, even though it is not a full conditional

distribution for all i and t. The spatial neighborhood N

may be further partitioned into

different orders of neighborhood. In particular, let N



(l)

, where N

(l)

denotes the

lth-order neighborhood that comprises the lth nearest neighbors for l = 1, ..., L. Similar

to the model specied via (17.16), the transition probability p(y



: t



= t − 1, ..., t − S)

and the subsequent joint distribution function can be obtained. For ease of presentation,

we focus on (17.16).

17.2.2 Statistical Inference

The intractable normalizing constant in the joint distribution function poses challenges in

the statistical inference for the autologistic model with or without regression, an area of

active research in the last couple of decades. While Besag (1975) originally proposed max-

imum pseudo-likelihood estimates (MPLEs), Huffer and Wu (1998) used Markov chain

Monte Carlo (MCMC) methods to approximate the unknown normalizing constant and

developed Monte Carlo maximum likelihood estimates (MCMLE) for spatial autologis-

tic models. Further, Huang and Ogata (2002) generalized the pseudo-likelihood function

and showed better performance of the resulting estimates than MPLE in terms of standard

errors and efciency relative to maximum likelihood estimates (MLEs). Berthelsen and

Møller (2003) developed path sampling to approximate the ratio of unknown normalizing









373 Autologistic Regression Models for Spatio-Temporal Binary Data

constants in spatial point processes, which Zheng and Zhu (2008) used for computing the

MCMLE. Friel et al. (2009) proposed a fast computation method for the estimation of the

normalizing constant based on a reduced dependence approximation of the likelihood

function. Later, we describe statistical inference based on MPLE, MCMLE, and Bayesian

hierarchical modeling.

17.2.2.1 Maximum Pseudo-Likelihood Estimation

Maximum pseudo-likelihood, rst introduced by Besag (1975) for autologistic models, is a

popular approach to the statistical inference for autologistic regression models. The MPLE

is the value of θ that maximizes the product of the full conditional distributions,

= arg max

(Y; θ),

where the pseudo-likelihood function for a spatio-temporal autologistic model is

(Y; θ) = p(y

i,t



: (i



, t



) = (i, t))

i,t



exp{



k,i,t

i,t

j∈N

p+1

i,t

j,t

+ θ

p+2

i,t

i,t−1

+ y

i,t+1

)}



i,t

1 + exp{

k=0

k,i,t

j∈N

p+1

j,t

+ θ

p+2

i,t−1

+ y

i,t+1

)}

(17.18)

Although the pseudo-likelihood function (17.18) is not the true likelihood except in the

trivial case of spatio-temporal independence, it can be shown that MPLEs are consistent

and asymptotically normal under suitable regularity conditions (Guyon, 1995).

To maximize the pseudo-likelihood function and obtain the MPLE of θ, it is straightfor-

ward to apply the standard logistic regression that assumes independence, which can be

implemented by, for example, proc logistic in SAS or the function glm in R. The corre-

sponding standard errors and approximate condence intervals can be obtained by a para-

metric bootstrap. Specically, in the parametric bootstrap, M resamples of spatio-temporal

binary responses are drawn according to the spatio-temporal autologistic regression model

using Gibbs sampling or perfect sampling. For each resample, an MPLE is computed and

the M resampled MPLEs are used to obtain an estimate of the variance of the MPLE based

on the original data. In particular, perfect sampling uses coupling and upon coalescence of

the coupled Markov chains, the resulting Monte Carlo samples are guaranteed to be from

the target distribution (e.g., Propp and Wilson, 1996; Møller, 1999).

17.2.2.2 Monte Carlo Maximum Likelihood Estimation

The maximum pseudo-likelihood approach is computationally efcient, but is statisti-

cally less efcient than maximum likelihood (Gumpertz et al., 1997; Wu and Huffer, 1997;

Zheng and Zhu, 2008). An alternative approach is Monte Carlo maximum likelihood

(MCML), where the normalizing constant is approximated using MCMC and thus direct

maximization of likelihood function can be obtained.



374 Handbook of Discrete-Valued Time Series

The likelihood function can be rewritten as

L(Y; θ) = p(y

, ..., y

T−1

, y

; θ) = c(θ)

−1

exp(θ



z),

where









 



z =



i,t

, x

1,i,t

i,t

, ..., x

p,i,t

i,t



, θ

p+1

i,t

i,t−1



i,t i,t i,t i,t i



∈N

i,t

Based on a preselected parameter vector ψ = (ψ

, ..., ψ

p+2

)



, approximate the ratio of two

normalizing constants via importance sampling by





M M

c(θ)

= E

exp(θ



≈ M

−1



exp(θ



)

= M

−1

exp{(θ − ψ)



c(ψ)

exp(ψ



z) exp(ψ



)

m=1 m=1

where z

is z evaluated at the mth Monte Carlo sample of Y for m = 1, ..., M.MonteCarlo

samples of Y are generated from the joint distribution evaluated at ψ. Then the MLE can

be approximated by maximizing a rescaled version of the likelihood function

 

−1

c(ψ)L(Y; θ) =

c(ψ)

exp(θ



z) =

−1

exp



(θ − ψ)





exp(θ



z).

c(θ)

m=1

The variances of the estimates can be estimated by using the diagonal elements of the

inverse of the observed Fisher information matrix (Huffer and Wu, 1998; Geyer, 1994).

The MCMLE provides a good approximation of the MLE of the model parameters when

the reference parameter ψ is close to the truth (Geyer and Thompson, 1992). The MPLE is a

natural choice for the reference parameter. However, when the spatial or temporal depen-

dence is strong, MPLE can be far away from the MLE, whereas MCMLE with MPLE as the

reference parameter may not exist and the iteration may lead to a sequence of estimates that

drift off to innity. In this case, we select ψ to be an approximation obtained by a stochastic

approximation algorithm. This is a two-stage MCMC stochastic approximation algorithm

proposed by Gu and Zhu (2001) for computing the MLEs of model parameters for a class of

spatial models. In the rst stage, the estimates are moved into a feasible region quickly by

using large gain constants in the stochastic approximation and in the second stage, an opti-

mal procedure is implemented with a stopping criterion chosen so that a desired precision

can be obtained. By the rst stage of the algorithm, ψ can be obtained.

17.2.3 Bayesian Inference

Bayesian hierarchical modeling can be applied for the inference about spatio-temporal

autologistic regression models. Møller et al. (2006) presented an auxiliary variable MCMC

algorithm that allows the construction of a proposal distribution so that the normalizing

constants cancel out in the Metropolis–Hastings (MH) ratio. Zheng and Zhu (2008) pro-

posed a Bayesian approach for both model parameter inference and prediction at future

375 Autologistic Regression Models for Spatio-Temporal Binary Data

time points using MCMC. They proposed an MH algorithm to generate Monte Carlo sam-

ples from the posterior distribution of the parameter θ, where the likelihood ratio in the

acceptance probability is approximated by

p(y

, ..., y

T−1

, y

; θ

∗

)

exp{θ

∗



c(θ)

c(ψ)

p(y

, ..., y

T−1

, y

; θ)

exp{θ



c(θ

∗

)

c(ψ)



≈ exp{(θ

∗

− θ)



z}×

m=1

exp{(θ − ψ)



}



exp{(θ

∗

− ψ)



}

Here, M Monte Carlo samples of Y need to be generated from the joint distribution p(y

, ...,

T−1

, y

, ψ) evaluated at ψ, but only once at the beginning of the MH algorithm, which

makes the algorithm efcient. For the MH algorithm, a good choice of the parameter vector

ψ helps to speed up the convergence process. The closer ψ is to the posterior mode of θ,the

better the results are. Further, the variance of the proposal distribution needs to be adjusted

to ensure a reasonable acceptance probability in the MH algorithm (Gelman et al., 2003).

Path sampling is an alternative way to calculate the ratio of two normalizing constants

and is based on the following identity:





�





(

)

θ(s)



where the expectation is with respect to the joint distribution evaluated at the parame-

ter θ(s) along a path of θ(s) =sθ for s ∈[0, 1] from 0 to θ. However, the computation

can be costly because multiple Monte Carlo samples of Y are required for computing the

expectation.

For the spatio-temporal autologistic regression model, Zheng and Zhu (2008) com-

pared the performance of MPL, MCML, and Bayesian inference. They demonstrated that

parameter inference via MPL can be statistically inefcient when spatial and/or temporal

dependence is strong, whereas the statistical properties of the MCML are comparable to

the Bayesian approach and the computation of MCML estimates is faster. Further, using

Bayesian inference, the posterior distribution of the model parameters can be obtained and

it becomes straightforward to construct credible bands at desired levels.

17.2.4 Prediction

Let Y

= (Y

T+1

, ..., Y

T+T

∗

)



denote the responses at future time points T +1, ..., T +T

∗

with

∗

≥ 1. For prediction of

based on model parameter estimates from MPL and MCML, a

Gibbs sampler can be used to obtain the Monte Carlo samples of

from

p(y˜|y

, y

T+T

∗

; θ)





 



T+T

∗

n T+T

∗



 

  

 

∝ exp

k,i,t

i,t

+ θ

p+1

i,t

j,t

p+2

i,t

i,t−1





t=T+1

i=1 k=0 i=1 j∈N

t=T+1

i=1

�













376 Handbook of Discrete-Valued Time Series

For prediction of

in the Bayesian framework, the posterior predictive distribution of

p(y˜|y, y

T+T

∗

) =

p(y˜|y, y

T+T

∗

; θ)p(θ|y)dθ.

To draw Monte Carlo samples of

from p(y˜|y, y

T+T

∗

), rst draw θ from its posterior

distribution p(θ|y) and then for each given θ,draw

from p(y˜|y, y

T+T

∗

; θ) using a Gibbs

sampler (Zheng and Zhu, 2008).

17.3 Centered Autologistic Regression Model

In the aforementioned autologistic regression models, the interpretation of model

parameters is not straightforward (Caragea and Kaiser, 2009; Kaiser and Caregea, 2009).

In the presence of positive spatial and temporal dependence, under the uncentered

parameterization, the conditional expectation of Y

i,t

given its neighbors is

E(Y

i,t



= y



: (i



, t



) ∈ N

i,t

)

exp



k,i,t



j∈N

p+1

j,t

+ θ

p+2

i,t−1

+ y

i,t+1

)

 

. (17.19)

1 + exp



k,i,t



j∈N

p+1

j,t

+ θ

p+2

i,t−1

+ y

i,t+1

)

The expectation (17.20) is larger than the expectation of Y

i,t

under independence,



exp

k=0

k,i,t

1 + exp



k,i,t

as long as Y

i,t

has nonzero spatial and/or temporal neighbors, but is never smaller. This

may not be reasonable when most neighbors are zeros and thus can bias the realizations

toward 1. Hence, the interpretation of dependence parameters is difcult. Further, the

marginal expectation of Y

i,t

(i.e., E(Y

i,t

k,i,t

, k = 1, ..., p)) is greater than the expectation

of Y

i,t

under independence. A simulation study in Wang (2013) showed that E(Y

i,t

k,i,t

, k =

1, ..., p) varies across different levels of spatial and temporal dependence for xed regres-

sion coefcients. These make the interpretation of regression coefcients unclear since these

coefcients are to reect the effects of covariates and should have a consistent interpretation

across varying dependence levels.

For non-Gaussian Markov random eld models of spatial lattice data, the idea of cen-

tered parameterization was rst proposed by Kaiser and Cressie (1997) for a Winsorized

Poisson conditional model. More recently, Kaiser and Caregea (2009) explored the cen-

tered parameterization for a general exponential family of Markov random eld models.

In particular, Caragea and Kaiser (2009) studied the centered parameterization for spatial

autologistic regression models and showed that the centered parameterization overcomes

the interpretation difculties. Wang and Zheng (2013) extended this work to the case of

spatio-temporal autologistic regression models.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 17: Autologistic Regression Models for Spatio-Temporal Binary Data (2/4)

Create new playlist

Sign In

Sign Up

Table of Contents for
17: Autologistic Regression Models for Spatio-Temporal Binary Data (2/4)