332 Handbook of Discrete-Valued Time Series
15.3 Data Models for Discrete-Valued Spatio-Temporal Data
As previously alluded to, discrete-valued spatio-temporal data arise across a broad range
of subject-matter disciplines, with the specic distribution chosen to facilitate the anal-
ysis under consideration. Although, in principle, the framework presented here can
accommodate virtually any discrete-valued distribution, we briey describe a few of the
more popular distributions that arise in practice. The distributions displayed here are not
meant to constitute an exhaustive list of potential data model distributions that could be
employed. Instead, the distributions we describe are merely meant to demonstrate the rich
class of discrete-valued spatio-temporal models that can be constructed under the BHM
(latent Gaussian process) framework described in Section 15.2. A specic example using
the Poisson distribution is considered in Section 15.5.2.
Again, although we mainly focus on count-valued spatio-temporal data, many other
discrete-valued data models could be considered. For example, when considering spatio-
temporal binary data it is natural to use logistic regression such that the conditional
distribution of Z
t
given the n-dimensional vector of probabilities p
t
is Bernoulli; that is,
Z
t
|p
t
ind. Bern H
t
p
t
.
In this case, it is natural to model p
t
through the logit link (where through an abuse of

notation we dene logit p
t
= log p
t
/ 1 p
t
to be the logit transform applied to each
element of p
t
). Alternatively, the probit link function could be considered in place of the
logit link and in many cases when dealing with binary spatio-temporal data use of a probit
link function along with data augmentation will facilitate computation (Albert and Chib,
1993). Although not considered here, the Bernoulli data model also arises in the context of
spatio-temporal auto-logistic models (Zhu and Zheng [2015; Chapter 17 in this volume];
Zhu et al., 2008) and agent-based models (Hooten and Wikle, 2010; Wikle and Hooten
[2015; Chapter 16 in this volume]).
In contrast to spatio-temporal binary data, one could consider a polychotomous out-
come (i.e., outcomes with more than two ordered categories). In this case, a natural data
model distribution is the multinomial. This gives rise to a spatio-temporal multinomial
logistic regression. That is, assuming the usual conditions for the K category multinomial
distribution, the data model is given by
Z
t
|n
t
, p
1,t
, ..., p
K,t
ind. Mult n
t
, H
t
p
1,t
, ..., H
t
p
K,t
. (15.5)
In (15.5), one possible model for p
k,t
(k = 1, ..., K) is the multinomial logit (e.g., see
Congdon, 2007; Arab et al., 2012). Under this construction, it is natural to model the
underlying process, the multinomial logit of p
k,t
(k = 1, ..., K), as a latent Gaussian
spatio-temporal process.
There are several popular choices for the data model when considering count-valued data
(e.g., Poisson, NegBin, CMP, etc.), with the specic choice often based on the level of dis-
persion and/or computational considerations. In most cases, the spatial dispersion is seen
to be overdispersed (i.e., the variance is greater than the mean). Although not as com-
mon, there are also several examples of underdispersion (i.e., the variance is less than the
mean) (Ridout and Besbeas, 2004). The latter case of underdispersion typically arises in
333 Hierarchical Dynamic Generalized Linear Mixed Models
situation where the observations occur as “rare events.” Finally, in practice, the case of
equidispersion (i.e., the variance is equal to the mean) is rarely satised.
The Poisson distribution has become the de facto distribution when it comes to mod-
eling spatio-temporal count-valued data and is the distribution we use for illustration
(Section 15.5). Therefore, we defer detailed discussion of this distribution until Section 15.5.
Although the model assumes equidispersion, the case of overdispersion is readily facili-
tated through a spatio-temporal random effect in a Gaussian latent process model for the
logarithm of the Poisson intensity parameter. Assuming all locations are observed at each
time point there is no need to include a mapping matrix from the observations to the process
model for the intensity, unless interest resides in an aggregate or other (possibly weighted)
function of the underlying process. Letting λ
t
denote the spatial intensity process at time t,
a typical model specication for a spatio-temporal Poisson model is given by
Z
t
| λ
t
ind. Pois
(
H
t
λ
t
)
,
where log(λ
t
) can be specied similar to the right-hand side of (15.4).
Another popular distribution for modeling overdispersed count-valued spatio-temporal
data is the NegBin (Greene, 2008). In contrast to the Poisson data model, this data model
has an explicit parameter that controls the level of overdispersion. Assuming the “inten-
sity” parameter, λ
t
, and dispersion parameter, ν, are greater than zero, the model can be
specied as
Z
t
| λ
t
, ν ind. NegBin
(
H
t
λ
t
, ν
)
,
where log
(
λ
t
)
can be specied similar to the right-hand side of (15.4) and ν (or log(ν)) can
be given an appropriate hyperprior. For a random variable Z, it is well known that the
expected value and variance of this distribution are given by E(Z) = λ and Var(Z) = λ +
νλ
2
(Greene, 2008), and thus, this distribution readily accommodates processes where the
variance exceeds the mean. Finally, for this distribution, it is possible to let the dispersion
parameter be space or time varying; however, only overdispersion can be accommodated.
A less common distribution used to model count data is the CMP distribution. As dis-
cussed in Wu et al. (2013), the CMP distribution can be used as a suitable data model
distribution when considering count-valued spatio-temporal data. The advantage of this
data model distribution is that it exibly allows for both spatial (or temporal) overdisper-
sion and underdispersion within the same model. Let λ
t
and ν be positive and denote the
CMP “intensity” and dispersion parameters, respectively. For this distribution, ν = 1 cor-
responds to the Poisson distribution, whereas ν < 1and ν > 1 correspond, respectively, to
overdispersed and underdispersed distributions. Further, the CMP distribution general-
izes to the geometric distribution (when ν = 0and λ < 1) and the Bernoulli distribution (as
ν −→ ) in the limiting cases (Shmueli et al., 2005). A spatio-temporal version of the CMP
distribution is given by
Z
t
| λ
t
, ν ind.CMP
(
H
t
λ
t
, ν
)
,
where log
(
λ
t
)
can be specied similar to the right-hand side of (15.4) and log(ν) is given a
suitable hyperprior. Alternatively, as proposed by Wu et al. (2013), a dynamic model for the
dispersion parameter could be imposed. Importantly, this distribution involves a normal-
izing constant that must be computed numerically since it involves the summation of an
334 Handbook of Discrete-Valued Time Series
innite series. For certain combinations of intensity and dispersion parameters, calculation
of the normalizing constant can be computationally intensive. For these cases, Minka et al.
(2003) derived an asymptotic approximation to the normalizing constant which is accurate
for λ > 10
ν
. In contrast, Wu et al. (2013) proposed further improvements to computing the
normalizing constant by taking advantage of parallel computing through Open Multipro-
cessing (OpenMP) and Compute Unied Device Architecture (CUDA), that is, graphics
processing unit (GPU).
15.4 Modeling Dynamics
Dynamic models have long been considered in the non-Gaussian time series context
(e.g., Carlin et al., 1992; Fahrmeir, 1992; Fahrmeir and Kaufmann, 1991; Gamerman, 1998;
Kitagawa, 1987; West et al., 1985). Such models often take a more “econometrics” avor, in
which one seeks to accommodate multivariate temporal dependence through time-varying
parameters. A major distinction between dynamical spatio-temporal models (DSTMs) and
traditional multivariate time series models is that (1) there is spatial dependence (typically,
nonstationary in space and nonseparable in space and time), (2) the spatial process chang-
ing through time is often of very high dimension, and (3) there is a scientic process that
drives the way in which this spatial dependence changes through time. That is, there is a
fundamental “process” that suggests modeling should be related to the evolution of a spa-
tial process through time rather than simply modeling correlated time series or specifying
marginal spatio-temporal dependence structures (Cressie and Wikle, 2011). We consider
this process-driven approach here, recognizing that it ts naturally in the aforementioned
hierarchical modeling framework. In this setting, we may be able to rely on fairly sim-
ple evolution models (e.g., autoregressive processes), but the modeling is complicated by
high dimensionality and the need to reduce dimensionality either in terms of the process
or in terms of parameter reduction. In addition, one must consider the possibility of mod-
eling more complicated nonlinear scientic processes within the Markovian framework.
Here, we focus on the case of discrete space and time, but note that the continuous space,
discrete time case is closely related (e.g., Wikle and Cressie, 1999; Wikle and Holan, 2011;
Wikle, 2002).
One can evolve the spatio-temporal process Y
t
using the standard approaches from
dynamical spatio-temporal models (e.g., Cressie and Wikle, 2011, Chapter 7) or through
more traditional econometric-based dynamic linear models (e.g., Gamerman et al., 2007;
Gelfand et al., 2005). However, as discussed in Cressie and Wikle (2011, Chapter 7), in
the spatio-temporal context, realistic dynamical evolution of these processes requires tran-
sition (propagator) matrices that can accommodate real-world dynamical features (e.g.,
advection, diffusion, growth, etc.). Furthermore, the dimensionality of such processes often
makes specication of the transition matrix a formidable challenge in terms of the num-
ber of parameters that must be estimated. Consequently, it is fairly typical to consider the
evolution of so-called “spatial random effects,” which are the projection coefcients of a
basis function expansion of Y
t
(e.g., Wikle and Cressie, 1999). Typically, the underlying
dynamics of interest exist on a lower-dimensional manifold, allowing for a reduced rank
representation, which also serves to reduce the parameter space associated with the pro-
cess evolution. Thus, rather than model the spatio-temporal process Y
t
directly, it is often
335 Hierarchical Dynamic Generalized Linear Mixed Models
convenient to consider the underlying spatio-temporal process to be decomposed into var-
ious components (e.g., Wikle et al., 2001; Wikle, 2003b; Wikle et al., 1998). For example,
consider
Y
t
= μ +
(1)
α
t
+
(2)
β
t
+
t
,
where Y
t
is an n × 1 process vector dened at n spatial locations of interest, μ is an n × 1
spatial mean vector,
(1)
is an n × p
1
matrix,
(2)
is an n × p
2
matrix, α
t
and β
t
are p
1
-,
p
2
-dimensional vectors, respectively, and
t
is an n × 1 mean zero spatial error process.
In high-dimensional settings,
(1)
is typically a “basis function” matrix, with α
t
denot-
ing the corresponding expansion coefcients. The choice of the matrix
(1)
in this context
has been the source of considerable study in recent years, with many choices available,
depending on whether these basis functions are specied (e.g., orthogonal polynomials,
multiresolution wavelets or Wendland functions, splines, empirical orthogonal functions
(EOFs), etc.), or whether they are in some sense estimated (e.g., discrete kernel convo-
lutions, “predictive processes,” dynamic factor models, etc.). Choices are typically made
based on ideology, but should be made on more practical considerations such as whether
the basis set is full rank (p
1
= n), rank reduced (i.e., p
1
n), or over-complete (p
1
n),
or whether one wishes the α
t
coefcients to be spatially referenced (as in the discrete ker-
nel convolution and “predictive process” approaches) or whether they live in “spectral”
space. These issues are discussed in depth in Wikle (2010) and Cressie and Wikle (2011,
Chapter 7). Our perspective is that these choices should consider the process dynamics,
data, and computational demands of the problem at hand.
The choice of
(2)
depends on the process Y
t
and the choice of
(1)
as well as the
computational demands of the problem of interest. For example, if
(1)
corresponds to
a rank-reduced basis for a large-scale dynamical process, then one might consider
(2)
to
correspond to smaller scales, which may have different dynamics (e.g., Gladish and Wikle,
2014; Wikle et al., 2001). Alternatively,
(2)
may correspond to covariates, or may be an
identity matrix, in which case β
t
are just “regression” coefcients or residual random effects
(likely confounded with ν
t
, the time-varying dispersion parameter, and
t
), respectively.
Clearly, not all of these components are required or useful in every spatio-temporal model–
choices must be made relative to the process and data at hand. We will focus the discussion
here on process-based dynamic models for α
t
.
Let α
t
α
1,t
, ..., α
p
1
,t
, where, depending on the choice of
(1)
,the index i in α
i,t
may correspond to either physical space or “spectral” space. We are typically interested
in a Markovian evolution model such as α
t
= M α
t1
; η
t
; θ , t = 1, 2, ..., where M(·) is
an evolution operator, η
t
an error process, and θ parameters (that may, themselves, vary
over space and/or time). Clearly, such a model is too general to be of much use beyond pro-
viding a conceptual framework. Rather, we consider the very general parametric class of
models suggested by general quadratic nonlinearity (GQN) (Wikle and Holan, 2011; Wikle
and Hooten, 2010):
p
1
p
1
p
1
L
Q
α
i,t
= m
i,j,t
α
j,t1
+ m
i,k
α
k,t1
g α
,t1
; θ
g
+ η
i,t
, (15.6)
j=1 k=1
=1
for i = 1, ..., p
1
, where η
i,t
is an error process (typically assumed to be a mean zero
Gaussian process with some variance–covariance matrix given by Q
α
), m
L
i,j,t
are linear
336 Handbook of Discrete-Valued Time Series
Q
interaction (transition) coefcients, m
i,k
are quadratic interaction coefcients, g(·) is some
transformation of α
,t
that depends on parameters θ
g
and gives the process more gen-
erality than the simple dyadic interactions in the α coefcients alone. As described in
Wikle and Hooten (2010), this framework is exceptionally exible in that it can account
for an extensive set of real-world mechanistic processes. Wikle and Holan (2011) show that
this extends to higher-order interactions and the integro-difference continuous space case.
However, even with quadratic interactions, the number of parameters that need to be esti-
mated is on the order of p
1
3
, which is a substantial curse of dimensionality. The efcient
parameterization of (15.6) becomes the principle challenge in DSTM specication.
There are several simplications and modeling approaches that can facilitate the spec-
ication of the parameter structure in (15.6). First, one can use the structure suggested
by discretization of relevant mechanistic models (e.g., partial differential equations such
as those given by reaction–diffusion and advection–diffusion processes) to simplify the
parameters given by the linear and quadratic interaction coefcients in both physical and
Galerkin (spectral) space (e.g., Hooten and Wikle, 2008; Wikle, 2003a; Wikle and Hooten,
2010; Wikle et al., 2001; Xu and Wikle, 2007). It is important to recognize that one uses
these mathematical model representations to reduce the number of parameters, but this
leaves many parameters that still must be estimated or modeled. Thus, such an approach
is referred to as an mechanistically motivated model (e.g., see tutorial discussion in Cressie
and Wikle, 2011, Chapters 6 and 7).
In situations where
(1)
and
(2)
correspond to large and medium/small-scale spatial
basis functions, respectively, Wikle and Hooten (2010) make a case for dimension reduction
based on arguments from turbulence theory. Wikle and Holan (2011) show that estimation
and Bayesian inference can be substantially improved if one applies a stochastic search vari-
able selection (e.g., George and McCulloch, 1993, 1997) approach to the linear and quadratic
interaction parameters. Gladish and Wikle (2014) show that one can also effectively reduce
the parameter space in this scenario if one assumes that medium scales inuence the evo-
lution of the large scales, but large scales do not inuence the evolution of medium scales,
which is also motivated by certain types of physical processes.
Critically, many real-world processes are very reasonably approximated by linear or
quasi-linear dynamics, that is, the case where the quadratic interaction coefcients in (15.6)
are zero. In this case, the model reduces to a vector autoregressive model with time-varying
coefcients,
α
t
= M
t
α
t1
+ η
t
, (15.7)
where M
t
= m
i
L
,j,t
is the p
1
×p
1
time-varying transition matrix and η
t
η
1,t
, ..., η
p
1
,t
i,j
Gau
(
0, Q
t
)
is the error process. Of course, the model given in (15.7) is just a multivariate
autoregressive time series process. However, as stated previously, in dynamic spatio-
temporal process modeling, the dimensionality of M
t
may preclude direct estimation, and
more critically, one should take into account the mechanistic nature of the dynamical evo-
lution in the parameterization of this matrix. The approaches discussed earlier associated
with mechanistically motivated specications, spectral parameter reduction, and stochas-
tic search variable selection can be used to facilitate this modeling. Importantly, it is often
the case in real-world processes that a static transition matrix (M
t
M) is adequate for
modeling the spatio-temporal dependence.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset