15: Hierarchical Dynamic Generalized Linear Mixed Models for Discrete-Valued Spatio-Temporal Data (2/5)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google









332 Handbook of Discrete-Valued Time Series

15.3 Data Models for Discrete-Valued Spatio-Temporal Data

As previously alluded to, discrete-valued spatio-temporal data arise across a broad range

of subject-matter disciplines, with the specic distribution chosen to facilitate the anal-

ysis under consideration. Although, in principle, the framework presented here can

accommodate virtually any discrete-valued distribution, we briey describe a few of the

more popular distributions that arise in practice. The distributions displayed here are not

meant to constitute an exhaustive list of potential data model distributions that could be

employed. Instead, the distributions we describe are merely meant to demonstrate the rich

class of discrete-valued spatio-temporal models that can be constructed under the BHM

(latent Gaussian process) framework described in Section 15.2. A specic example using

the Poisson distribution is considered in Section 15.5.2.

Again, although we mainly focus on count-valued spatio-temporal data, many other

discrete-valued data models could be considered. For example, when considering spatio-

temporal binary data it is natural to use logistic regression such that the conditional

distribution of Z

given the n-dimensional vector of probabilities p

is Bernoulli; that is,

∼ ind. Bern H

In this case, it is natural to model p

through the logit link (where through an abuse of



 





notation we dene logit p

= log p

/ 1 − p

to be the logit transform applied to each

element of p

). Alternatively, the probit link function could be considered in place of the

logit link and in many cases when dealing with binary spatio-temporal data use of a probit

link function along with data augmentation will facilitate computation (Albert and Chib,

1993). Although not considered here, the Bernoulli data model also arises in the context of

spatio-temporal auto-logistic models (Zhu and Zheng [2015; Chapter 17 in this volume];

Zhu et al., 2008) and agent-based models (Hooten and Wikle, 2010; Wikle and Hooten

[2015; Chapter 16 in this volume]).

In contrast to spatio-temporal binary data, one could consider a polychotomous out-

come (i.e., outcomes with more than two ordered categories). In this case, a natural data

model distribution is the multinomial. This gives rise to a spatio-temporal multinomial

logistic regression. That is, assuming the usual conditions for the K category multinomial

distribution, the data model is given by

, p

1,t

, ..., p

K,t

∼ ind. Mult n

, H

1,t

, ..., H

K,t

. (15.5)

In (15.5), one possible model for p

k,t

(k = 1, ..., K) is the multinomial logit (e.g., see

Congdon, 2007; Arab et al., 2012). Under this construction, it is natural to model the

underlying process, the multinomial logit of p

k,t

(k = 1, ..., K), as a latent Gaussian

spatio-temporal process.

There are several popular choices for the data model when considering count-valued data

(e.g., Poisson, NegBin, CMP, etc.), with the specic choice often based on the level of dis-

persion and/or computational considerations. In most cases, the spatial dispersion is seen

to be overdispersed (i.e., the variance is greater than the mean). Although not as com-

mon, there are also several examples of underdispersion (i.e., the variance is less than the

mean) (Ridout and Besbeas, 2004). The latter case of underdispersion typically arises in

333 Hierarchical Dynamic Generalized Linear Mixed Models

situation where the observations occur as “rare events.” Finally, in practice, the case of

equidispersion (i.e., the variance is equal to the mean) is rarely satised.

The Poisson distribution has become the de facto distribution when it comes to mod-

eling spatio-temporal count-valued data and is the distribution we use for illustration

(Section 15.5). Therefore, we defer detailed discussion of this distribution until Section 15.5.

Although the model assumes equidispersion, the case of overdispersion is readily facili-

tated through a spatio-temporal random effect in a Gaussian latent process model for the

logarithm of the Poisson intensity parameter. Assuming all locations are observed at each

time point there is no need to include a mapping matrix from the observations to the process

model for the intensity, unless interest resides in an aggregate or other (possibly weighted)

function of the underlying process. Letting λ

denote the spatial intensity process at time t,

a typical model specication for a spatio-temporal Poisson model is given by

| λ

∼ ind. Pois

(

)

where log(λ

) can be specied similar to the right-hand side of (15.4).

Another popular distribution for modeling overdispersed count-valued spatio-temporal

data is the NegBin (Greene, 2008). In contrast to the Poisson data model, this data model

has an explicit parameter that controls the level of overdispersion. Assuming the “inten-

sity” parameter, λ

, and dispersion parameter, ν, are greater than zero, the model can be

specied as

| λ

, ν ∼ ind. NegBin

(

, ν

)

where log

(

)

can be specied similar to the right-hand side of (15.4) and ν (or log(ν)) can

be given an appropriate hyperprior. For a random variable Z, it is well known that the

expected value and variance of this distribution are given by E(Z) = λ and Var(Z) = λ +

νλ

(Greene, 2008), and thus, this distribution readily accommodates processes where the

variance exceeds the mean. Finally, for this distribution, it is possible to let the dispersion

parameter be space or time varying; however, only overdispersion can be accommodated.

A less common distribution used to model count data is the CMP distribution. As dis-

cussed in Wu et al. (2013), the CMP distribution can be used as a suitable data model

distribution when considering count-valued spatio-temporal data. The advantage of this

data model distribution is that it exibly allows for both spatial (or temporal) overdisper-

sion and underdispersion within the same model. Let λ

and ν be positive and denote the

CMP “intensity” and dispersion parameters, respectively. For this distribution, ν = 1 cor-

responds to the Poisson distribution, whereas ν < 1and ν > 1 correspond, respectively, to

overdispersed and underdispersed distributions. Further, the CMP distribution general-

izes to the geometric distribution (when ν = 0and λ < 1) and the Bernoulli distribution (as

ν −→ ∞ ) in the limiting cases (Shmueli et al., 2005). A spatio-temporal version of the CMP

distribution is given by

| λ

, ν ∼ ind.CMP

(

, ν

)

where log

(

)

can be specied similar to the right-hand side of (15.4) and log(ν) is given a

suitable hyperprior. Alternatively, as proposed by Wu et al. (2013), a dynamic model for the

dispersion parameter could be imposed. Importantly, this distribution involves a normal-

izing constant that must be computed numerically since it involves the summation of an

334 Handbook of Discrete-Valued Time Series

innite series. For certain combinations of intensity and dispersion parameters, calculation

of the normalizing constant can be computationally intensive. For these cases, Minka et al.

(2003) derived an asymptotic approximation to the normalizing constant which is accurate

for λ > 10

. In contrast, Wu et al. (2013) proposed further improvements to computing the

normalizing constant by taking advantage of parallel computing through Open Multipro-

cessing (OpenMP) and Compute Unied Device Architecture (CUDA), that is, graphics

processing unit (GPU).

15.4 Modeling Dynamics

Dynamic models have long been considered in the non-Gaussian time series context

(e.g., Carlin et al., 1992; Fahrmeir, 1992; Fahrmeir and Kaufmann, 1991; Gamerman, 1998;

Kitagawa, 1987; West et al., 1985). Such models often take a more “econometrics” avor, in

which one seeks to accommodate multivariate temporal dependence through time-varying

parameters. A major distinction between dynamical spatio-temporal models (DSTMs) and

traditional multivariate time series models is that (1) there is spatial dependence (typically,

nonstationary in space and nonseparable in space and time), (2) the spatial process chang-

ing through time is often of very high dimension, and (3) there is a scientic process that

drives the way in which this spatial dependence changes through time. That is, there is a

fundamental “process” that suggests modeling should be related to the evolution of a spa-

tial process through time rather than simply modeling correlated time series or specifying

marginal spatio-temporal dependence structures (Cressie and Wikle, 2011). We consider

this process-driven approach here, recognizing that it ts naturally in the aforementioned

hierarchical modeling framework. In this setting, we may be able to rely on fairly sim-

ple evolution models (e.g., autoregressive processes), but the modeling is complicated by

high dimensionality and the need to reduce dimensionality either in terms of the process

or in terms of parameter reduction. In addition, one must consider the possibility of mod-

eling more complicated nonlinear scientic processes within the Markovian framework.

Here, we focus on the case of discrete space and time, but note that the continuous space,

discrete time case is closely related (e.g., Wikle and Cressie, 1999; Wikle and Holan, 2011;

Wikle, 2002).

One can evolve the spatio-temporal process Y

using the standard approaches from

dynamical spatio-temporal models (e.g., Cressie and Wikle, 2011, Chapter 7) or through

more traditional econometric-based dynamic linear models (e.g., Gamerman et al., 2007;

Gelfand et al., 2005). However, as discussed in Cressie and Wikle (2011, Chapter 7), in

the spatio-temporal context, realistic dynamical evolution of these processes requires tran-

sition (propagator) matrices that can accommodate real-world dynamical features (e.g.,

advection, diffusion, growth, etc.). Furthermore, the dimensionality of such processes often

makes specication of the transition matrix a formidable challenge in terms of the num-

ber of parameters that must be estimated. Consequently, it is fairly typical to consider the

evolution of so-called “spatial random effects,” which are the projection coefcients of a

basis function expansion of Y

(e.g., Wikle and Cressie, 1999). Typically, the underlying

dynamics of interest exist on a lower-dimensional manifold, allowing for a reduced rank

representation, which also serves to reduce the parameter space associated with the pro-

cess evolution. Thus, rather than model the spatio-temporal process Y

directly, it is often

 





  









335 Hierarchical Dynamic Generalized Linear Mixed Models

convenient to consider the underlying spatio-temporal process to be decomposed into var-

ious components (e.g., Wikle et al., 2001; Wikle, 2003b; Wikle et al., 1998). For example,

consider

= μ + 

(1)

+ 

(2)

+ 

where Y

is an n × 1 process vector dened at n spatial locations of interest, μ is an n × 1

spatial mean vector, 

(1)

is an n × p

matrix, 

(2)

is an n × p

matrix, α

and β

are p

-dimensional vectors, respectively, and 

is an n × 1 mean zero spatial error process.

In high-dimensional settings, 

(1)

is typically a “basis function” matrix, with α

denot-

ing the corresponding expansion coefcients. The choice of the matrix 

(1)

in this context

has been the source of considerable study in recent years, with many choices available,

depending on whether these basis functions are specied (e.g., orthogonal polynomials,

multiresolution wavelets or Wendland functions, splines, empirical orthogonal functions

(EOFs), etc.), or whether they are in some sense estimated (e.g., discrete kernel convo-

lutions, “predictive processes,” dynamic factor models, etc.). Choices are typically made

based on ideology, but should be made on more practical considerations such as whether

the basis set is full rank (p

= n), rank reduced (i.e., p

 n), or over-complete (p

 n),

or whether one wishes the α

coefcients to be spatially referenced (as in the discrete ker-

nel convolution and “predictive process” approaches) or whether they live in “spectral”

space. These issues are discussed in depth in Wikle (2010) and Cressie and Wikle (2011,

Chapter 7). Our perspective is that these choices should consider the process dynamics,

data, and computational demands of the problem at hand.

The choice of 

(2)

depends on the process Y

and the choice of 

(1)

as well as the

computational demands of the problem of interest. For example, if 

(1)

corresponds to

a rank-reduced basis for a large-scale dynamical process, then one might consider 

(2)

correspond to smaller scales, which may have different dynamics (e.g., Gladish and Wikle,

2014; Wikle et al., 2001). Alternatively, 

(2)

may correspond to covariates, or may be an

identity matrix, in which case β

are just “regression” coefcients or residual random effects

(likely confounded with ν

, the time-varying dispersion parameter, and 

), respectively.

Clearly, not all of these components are required or useful in every spatio-temporal model–

choices must be made relative to the process and data at hand. We will focus the discussion

here on process-based dynamic models for α

Let α

≡



1,t

, ..., α





, where, depending on the choice of 

(1)

,the index i in α

i,t

may correspond to either physical space or “spectral” space. We are typically interested

in a Markovian evolution model such as α

= M α

t−1

; η

; θ , t = 1, 2, ..., where M(·) is

an evolution operator, η

an error process, and θ parameters (that may, themselves, vary

over space and/or time). Clearly, such a model is too general to be of much use beyond pro-

viding a conceptual framework. Rather, we consider the very general parametric class of

models suggested by general quadratic nonlinearity (GQN) (Wikle and Holan, 2011; Wikle

and Hooten, 2010):

i,t

= m

i,j,t

j,t−1

+ m

i,k

k,t−1

g α

,t−1

; θ

+ η

i,t

, (15.6)

j=1 k=1

=1

for i = 1, ..., p

, where η

i,t

is an error process (typically assumed to be a mean zero

Gaussian process with some variance–covariance matrix given by Q

), m

i,j,t

are linear





336 Handbook of Discrete-Valued Time Series

interaction (transition) coefcients, m

i,k

are quadratic interaction coefcients, g(·) is some

transformation of α

,t

that depends on parameters θ

and gives the process more gen-

erality than the simple dyadic interactions in the α coefcients alone. As described in

Wikle and Hooten (2010), this framework is exceptionally exible in that it can account

for an extensive set of real-world mechanistic processes. Wikle and Holan (2011) show that

this extends to higher-order interactions and the integro-difference continuous space case.

However, even with quadratic interactions, the number of parameters that need to be esti-

mated is on the order of p

, which is a substantial curse of dimensionality. The efcient

parameterization of (15.6) becomes the principle challenge in DSTM specication.

There are several simplications and modeling approaches that can facilitate the spec-

ication of the parameter structure in (15.6). First, one can use the structure suggested

by discretization of relevant mechanistic models (e.g., partial differential equations such

as those given by reaction–diffusion and advection–diffusion processes) to simplify the

parameters given by the linear and quadratic interaction coefcients in both physical and

Galerkin (spectral) space (e.g., Hooten and Wikle, 2008; Wikle, 2003a; Wikle and Hooten,

2010; Wikle et al., 2001; Xu and Wikle, 2007). It is important to recognize that one uses

these mathematical model representations to reduce the number of parameters, but this

leaves many parameters that still must be estimated or modeled. Thus, such an approach

is referred to as an mechanistically motivated model (e.g., see tutorial discussion in Cressie

and Wikle, 2011, Chapters 6 and 7).

In situations where 

(1)

and 

(2)

correspond to large and medium/small-scale spatial

basis functions, respectively, Wikle and Hooten (2010) make a case for dimension reduction

based on arguments from turbulence theory. Wikle and Holan (2011) show that estimation

and Bayesian inference can be substantially improved if one applies a stochastic search vari-

able selection (e.g., George and McCulloch, 1993, 1997) approach to the linear and quadratic

interaction parameters. Gladish and Wikle (2014) show that one can also effectively reduce

the parameter space in this scenario if one assumes that medium scales inuence the evo-

lution of the large scales, but large scales do not inuence the evolution of medium scales,

which is also motivated by certain types of physical processes.

Critically, many real-world processes are very reasonably approximated by linear or

quasi-linear dynamics, that is, the case where the quadratic interaction coefcients in (15.6)

are zero. In this case, the model reduces to a vector autoregressive model with time-varying

coefcients,

= M

t−1

+ η

, (15.7)

where M

= m

,j,t

is the p

×p

time-varying transition matrix and η

≡



1,t

, ..., η





∼

i,j

Gau

(

0, Q

)

is the error process. Of course, the model given in (15.7) is just a multivariate

autoregressive time series process. However, as stated previously, in dynamic spatio-

temporal process modeling, the dimensionality of M

may preclude direct estimation, and

more critically, one should take into account the mechanistic nature of the dynamical evo-

lution in the parameterization of this matrix. The approaches discussed earlier associated

with mechanistically motivated specications, spectral parameter reduction, and stochas-

tic search variable selection can be used to facilitate this modeling. Importantly, it is often

the case in real-world processes that a static transition matrix (M

≡ M) is adequate for

modeling the spatio-temporal dependence.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 15: Hierarchical Dynamic Generalized Linear Mixed Models for Discrete-Valued Spatio-Temporal Data (2/5)

Create new playlist

Sign In

Sign Up

Table of Contents for
15: Hierarchical Dynamic Generalized Linear Mixed Models for Discrete-Valued Spatio-Temporal Data (2/5)