15
Hierarchical Dynamic Generalized Linear Mixed
Models for Discrete-Valued Spatio-Temporal Data
Scott H. Holan and Christopher K. Wikle
CONTENTS
15.1 Introduction...................................................................................327
15.2 Hierarchical Models..........................................................................328
15.3 Data Models for Discrete-Valued Spatio-Temporal Data...............................332
15.4 Modeling Dynamics..........................................................................334
15.5 Example: Forecasting Migratory Bird Settling Patterns.................................337
15.5.1 Breeding Population SurveyData..................................................337
15.5.2 Spatio-Temporal Poisson Models..................................................337
15.5.3 Forecasting Application: Breeding Population Survey. . . ... . . . ... . . . ... . . ... . .339
15.6 Conclusion.....................................................................................342
References............................................................................................344
15.1 Introduction
Discrete-valued spatio-temporal data arise frequently across a diverse range of subject-
matter disciplines, including epidemiology, small area estimation in federal surveys,
environmental science, and ecology, among others. In general, modeling this type of data
can prove challenging due to the complexity of the observed data and underlying dynam-
ical processes (e.g., see Cressie and Wikle, 2011, and the references therein). In this chapter,
we focus primarily on modeling count data using spatio-temporal generalized linear mod-
els within a Bayesian hierarchical modeling (BHM) framework. In particular, we review
some of the common methods in this context and describe some recent advances. For
completeness, we provide brief discussion surrounding other types of discrete-valued
spatio-temporal data, such as Bernoulli data and others. Finally, we provide a succinct
real data illustration outlining the prediction of waterfowl migratory patterns across the
north-central United States and Canada.
In the context of modeling count spatio-temporal data, several methods have emerged,
including auto-Poisson models (Besag, 1974), generalized linear dynamical (spatio-
temporal) mixed models (Wikle, 2002), and Bayesian nonparametric methods based on
Dirichlet process mixtures (Kottas et al., 2008), among others. The direction pursued here
focuses on generalized linear mixed models (GLMMs) (see McCulloch et al., 2001, for a
brief overview of GLMMs). Specically, we consider generalized linear models (GLMs)
with a latent Gaussian process model (e.g., see the overview in Cressie and Wikle, 2011,
327
328 Handbook of Discrete-Valued Time Series
and the references therein). In this context, we have a non-Gaussian data model along with
a latent dynamic Gaussian model for the underlying unobserved process (Section 15.2)
and, thus, the latent random effects cannot be integrated out analytically (Verbeke and
Molenberghs, 2009). From this perspective, the models we describe are similar to the
dynamic linear models framework in the time series (non-spatial) case. For further discus-
sion surrounding Bayesian dynamic linear models see Gamerman et al. (2015; Chapter 8 in
this volume) and the references therein.
To date, there have been many methodological contributions in the area of BHMs for
count-valued spatio-temporal data. For example, Waller et al. (1997) consider a spatio-
temporal count model for mapping disease rates, where the observations are assumed
to come from a Poisson distribution. In the context of ecological modeling, Wikle (2003)
introduces a Bayesian hierarchical spatio-temporal Poisson model to predict the relative
population abundance of house nches over the eastern United States. Wikle and Anderson
(2003) propose a spatio-temporal zero-inated Poisson model that uses exogenous climate
processes to model tornado counts.
Other diverse application areas include Wikle and Royle (2005) where the authors pro-
pose a dynamic spatio-temporal exponential family (Poisson) model for selecting sampling
locations to estimate July brood counts in the Prairie Pothole Region of the United States.
In contrast, Schrödle and Held (2011) describe spatio-temporal disease mapping mod-
els using integrated nested Laplace approximations (INLA) to facilitate fast computation
in the context of space–time count data. Further, Lopes et al. (2011) introduce a class of
spatio-temporal latent factor models for observations belonging to the exponential fam-
ily of distributions. However, the models are illustrated using a Bernoulli data example
to model rainfall. Finally, Wu et al. (2013) develop a class of Bayesian Conway-Maxwell
Poisson (CMP) models with dynamic dispersion and illustrate the approach by estimating
migratory waterfowl settling patterns.
The area of discrete-valued spatio-temporal modeling is expansive in terms of both
methodological contributions and applications. The previous list of contributions is in no
way meant to be exhaustive. Instead, it serves to illustrate the rich literature that exists on
the subject. For further discussion, see Cressie and Wikle (2011) and the references therein.
This chapter proceeds as follows. Section 15.2 provides a general description of spatio-
temporal modeling from a BHM perspective. Specically, this section reviews the Bayesian
hierarchical framework and details effective partitioning of the model hierarchy in terms
of models for the observed data, latent processes, and parameters. Section 15.3 discusses
various data models for discrete-valued spatio-temporal data, whereas modeling dynamics
is pursued in Section 15.4. Section 15.5 provides an illustration of modeling spatio-temporal
count data in the context of an application to forecasting migratory bird settling patterns.
Specically, this section methodologically illustrates the use of a spatio-temporal Poisson
model, with a latent dynamic Gaussian process for the Poisson intensity parameter through
a real data example. Finally, Section 15.6 provides concluding discussion.
15.2 Hierarchical Models
The hierarchical paradigm has experienced signicant growth over the past two decades.
The original ideas behind the process-based hierarchical modeling approach, as presented
here, emerged largely out of the work of Berliner (1996) and have been further exposited
329 Hierarchical Dynamic Generalized Linear Mixed Models
by Wikle et al. (1998), Wikle (2003b), Cressie et al. (2009), and Wikle et al. (2013), among
others. This approach is conceptually straightforward and it provides an extremely rich
framework for modeling complex dependence structures in the context of discrete-valued
spatio-temporal processes. Importantly, in addition to the process-based emphasis, the
hierarchical framework presented here also emphasizes modeling parameters, which is
often not the case in nested error regression-type hierarchical models.
Although the hierarchical modeling paradigm has become fairly well established (e.g.,
see Cressie and Wikle, 2011, and the references therein), we provide a brief description
here for those readers less familiar with these ideas. The main idea underlying the BHMs
presented here is to consider a joint probability model for the data, process, and parame-
ters, which are generally specied through conditionally linked model components; that
is, the data conditioned on the process and parameters and the process conditioned on the
parameters. Several references focus on this type of hierarchical thinking including Royle
and Dorazio (2008) and Cressie and Wikle (2011), among others, whereas more traditional
presentations of hierarchal modeling can be found in Banerjee et al. (2003), Carlin and Louis
(2011), Gelman et al. (2013), and the references therein.
Synthesis and effective utilization of information, both from direct and from indirect
sources, are two paramount objectives in statistical modeling and data analysis. In fact,
both direct and indirect sources of information play a key role in statistical modeling and
often include expert opinion, physical laws, and previous empirical results. For specicity,
consider the case where we have an underlying scientic process of interest, denoted by
Y (a spatio-temporal process). Associated with this process we also have observed data,
say Z. We assume that we have parameters θ
Z
associated with the measurement process
Z that might account for differences in the support and representativeness between the Z
and the underlying true process Y dened at a given resolution of interest. Additionally,
we assume that there are some parameters θ
Y
, typically associated with the evolution oper-
ator and innovation covariances, that describe the dynamics of true underlying process of
interest, Y.
Let [Z|Y] and [Y] denote the conditional distribution of Z given Y and the marginal distri-
bution of Y, respectively. Then, assuming conditional independence of the parameters and
using the law of total probability, the joint probability distribution of the data and process
given the parameters can be decomposed as
[Z, Y|θ
Z
, θ
Y
] = [Z|Y, θ
Z
][Y|θ
Y
] , (15.1)
where [Z|Y, θ
Z
] is the data distribution (or “data model”—assuming conditional indepen-
dence) and [Y|θ
Y
] denotes the process distribution (or “process model”).
In traditional statistics, typically, the data Z is given some specied distributional form
along with associated parameters θ = (θ
Z
, θ
Y
) corresponding to the spatio-temporal mean,
variances, and covariances. Although distributional assumptions for Z can be relaxed in the
context of discrete-valued spatio-temporal data (e.g., Kottas et al., 2008), we limit our dis-
cussion to parametric models and focus primarily on count-valued spatio-temporal data.
Integrating out the random process Y in (15.1) results in [Z|θ], in which case interest resides
in estimating the parameters given the data. The disadvantage of such estimation is that
it eliminates explicit estimation of the underlying true latent process Y. Instead, the distri-
bution for Y is implicitly included through the rst and second moments as a result of the
integration.

330 Handbook of Discrete-Valued Time Series
Modeling spatio-temporal count data (or count time series for that matter) can pro-
ceed either from an observation-driven perspective or using a process-driven (parameter-
driven) approach. By taking a process-based approach (i.e., explicitly modeling Y) several
advantages arise. First, in many applications, one is actually interested in predicting the
true underlying latent process Y, rather than just accounting for the co-variability. Second,
given the complexity and high dimensionality of many real-world observed processes, it
is often extremely difcult to specify the dependence structure associated with Z (e.g., due
to non-Gaussianity, nonlinearity in time, and/or nonstationarity in space and/or time).
Consequently, as a result of needing to specify a realistic dependence structure, likelihood-
based inference in this context is challenging. In contrast, by placing emphasis on modeling
the process Y instead, one can directly incorporate scientic insight into the model and
more easily account for measurement (and/or sampling) and process uncertainty. For
example, Markovian approximations and spatially and/or time-varying parameters can
be readily incorporated in the model hierarchy. In other words, the hierarchical (condi-
tional) specication allows extremely complicated marginal dependence structures to be
replaced by a more scientic specication of the conditional mean as random process at a
lower stage in the model hierarchy. This type of modeling is analogous to the traditional
mixed model setting, where the practitioner must choose between the marginal model that
arises from integrating out the random effects or the conditional model, where the random
effects are predicted and the conditional covariance of the data model is less complicated
(Demidenko, 2013). In contrast to the linear mixed model case, the generalized linear mixed
model case, which is the focus here, is signicantly more complicated. Importantly, for non-
Gaussian data models, it is seldom possible to analytically integrate out the random effects.
In other words, it is rarely the case that integrating out the random effect will result in a
closed-form solution. Consequently, discrete-valued dynamic spatio-temporal generalized
linear mixed models are typically quite computationally demanding, even after some form
of dimension reduction.
In general, interest resides in estimating the posterior distribution of the process and
parameters given the data. Using Bayes theorem, the fully BHM can be represented as
[Y, θ|Z]∝[Z|Y, θ
Z
][Y|θ
Y
][θ
Z
, θ
Y
] , (15.2)
where θ =
(
θ
Y
, θ
Z
)
and it is necessary to specify a prior distribution for [θ
Z
, θ
Y
].Notethat
in (15.2) the normalizing constant integrates over both the process Y and the parameters
θ
Z
and θ
Y
.
Importantly, this representation facilitates a conditional way of thinking about compli-
cated applications in a probabilistically consistent manner and naturally provides a means
of quantifying uncertainty. In the context of spatio-temporal count-valued data, the data
model (i.e., [Z|Y, θ
Z
]) will follow a count distribution such as a Poisson, negative binomial
(NegBin), or CMP, among others (see Section 15.3).
An important aspect of (15.2) is that the right-hand side can be further decomposed
into several submodels. For example, assuming conditional independence given the true
underlying process, multiple data sets with different spatial and/or temporal supports
could be accommodated through the following data model specication:
Z
(1)
, Z
(2)
|Y, θ
(1) (2)
Z
(1)
|Y, θ
(1)
Z
(2)
|Y, θ
(2)
Z
θ
Z
=
Z Z
, (15.3)

331 Hierarchical Dynamic Generalized Linear Mixed Models
where, for j = 1, 2, Z
(j)
and θ
Z
(j)
correspond to the observations and parameters from the
jth data set, respectively (e.g., see Wang et al., 2012). In this context, Z
(1)
and Z
(2)
need
not have the same data distribution. Although, in practice, the assumption of conditional
independence is often reasonable across a wide range of applications, when possible, this
assumption should be validated.
For many applications, it is also natural to decompose the model for the process into
subcomponents. In particular, in the context of discrete-valued spatio-temporal data, it
is often natural to assume that the process has a Markov structure in time. Assuming a
rst-order Markov structure in time yields the following decomposition:
T
[Y]=[
{
Y
0
, Y
1
, ..., Y
T
}
] = [Y
0
] [Y
t
|Y
t1
] .
t=1
Alternatively, the process model could be further decomposed to accommodate multivari-
ate structure. In this case, letting [Y]= Y
(1)
, Y
(2)
, the process model can be expressed as
[Y]= Y
(2)
|Y
(1)
Y
(1)
, where the order of conditioning is usually suggested by the specic
application and chosen by the practitioner (Royle and Berliner, 1999).
There is a vast literature on modeling non-Gaussian time series using a state-
space approach (e.g., Carlin et al., 1992; Fahrmeir, 1992; Fahrmeir and Kaufmann, 1991;
Gamerman, 1998; Kitagawa, 1987; West et al., 1985, among others). One major distinc-
tion between models in the time series case and the models described here is that in the
spatio-temporal setting we now need to consider spatial dependence in addition to serial
correlation, with these two dependence structures typically being nonseparable. Also, in
contrast to the pure time series case, the spatio-temporal case often suffers from being
extremely high dimensional. Specically, consider a process that is measured at n locations
and T times. Going from the pure time series case to the spatio-temporal setting results in an
increase of (n1)T observations. Consequently, a necessary component of spatio-temporal
modeling resides in effective dimension reduction.
In the context of discrete-valued spatio-temporal models, we assume that the data model
comes from the exponential family and is non-Gaussian (e.g., Bernoulli, Poisson, NegBin,
etc.); see Section 15.3. In particular, using similar notation to Cressie and Wikle (2011)
we assume that Z
t
denotes an m
t
-dimensional vector of observations at time t from the
exponential family of distributions. That is,
[Z
t
|γ
t
] exp γ
t
Z
t
b
t
γ
t
c
t
(
Z
t
)
,
where γ
t
denotes an m
t
-dimensional set of natural parameters that depend on the process
Y
t
and E Z
t
|γ
t
μ
t
. Then, assuming the usual regularity conditions for the exponential
family of distributions (McCulloch et al., 2001) we have that
g μ
t
= X
t
β + H
t
(
θ
h
)
Y
t
+ η
t
, (15.4)
where g(·) is a known link function, Y
t
is an n-dimensional spatial process vector of interest,
X
t
is a matrix of covariates (assumed known), β are the unknown “regression” coefcients
associated with X, H
t
is the m
t
× n observation matrix which is often assumed known
but could also be specied in terms of the unknown hyperparameters θ
h
,and η
t
is an
independent (across time) additive error term. It is important to note that, depending on
the particular application, the additive error term, η
t
, may not be warranted.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset