329 Hierarchical Dynamic Generalized Linear Mixed Models
by Wikle et al. (1998), Wikle (2003b), Cressie et al. (2009), and Wikle et al. (2013), among
others. This approach is conceptually straightforward and it provides an extremely rich
framework for modeling complex dependence structures in the context of discrete-valued
spatio-temporal processes. Importantly, in addition to the process-based emphasis, the
hierarchical framework presented here also emphasizes modeling parameters, which is
often not the case in nested error regression-type hierarchical models.
Although the hierarchical modeling paradigm has become fairly well established (e.g.,
see Cressie and Wikle, 2011, and the references therein), we provide a brief description
here for those readers less familiar with these ideas. The main idea underlying the BHMs
presented here is to consider a joint probability model for the data, process, and parame-
ters, which are generally specied through conditionally linked model components; that
is, the data conditioned on the process and parameters and the process conditioned on the
parameters. Several references focus on this type of hierarchical thinking including Royle
and Dorazio (2008) and Cressie and Wikle (2011), among others, whereas more traditional
presentations of hierarchal modeling can be found in Banerjee et al. (2003), Carlin and Louis
(2011), Gelman et al. (2013), and the references therein.
Synthesis and effective utilization of information, both from direct and from indirect
sources, are two paramount objectives in statistical modeling and data analysis. In fact,
both direct and indirect sources of information play a key role in statistical modeling and
often include expert opinion, physical laws, and previous empirical results. For specicity,
consider the case where we have an underlying scientic process of interest, denoted by
Y (a spatio-temporal process). Associated with this process we also have observed data,
say Z. We assume that we have parameters θ
Z
associated with the measurement process
Z that might account for differences in the support and representativeness between the Z
and the underlying true process Y dened at a given resolution of interest. Additionally,
we assume that there are some parameters θ
Y
, typically associated with the evolution oper-
ator and innovation covariances, that describe the dynamics of true underlying process of
interest, Y.
Let [Z|Y] and [Y] denote the conditional distribution of Z given Y and the marginal distri-
bution of Y, respectively. Then, assuming conditional independence of the parameters and
using the law of total probability, the joint probability distribution of the data and process
given the parameters can be decomposed as
[Z, Y|θ
Z
, θ
Y
] = [Z|Y, θ
Z
][Y|θ
Y
] , (15.1)
where [Z|Y, θ
Z
] is the data distribution (or “data model”—assuming conditional indepen-
dence) and [Y|θ
Y
] denotes the process distribution (or “process model”).
In traditional statistics, typically, the data Z is given some specied distributional form
along with associated parameters θ = (θ
Z
, θ
Y
) corresponding to the spatio-temporal mean,
variances, and covariances. Although distributional assumptions for Z can be relaxed in the
context of discrete-valued spatio-temporal data (e.g., Kottas et al., 2008), we limit our dis-
cussion to parametric models and focus primarily on count-valued spatio-temporal data.
Integrating out the random process Y in (15.1) results in [Z|θ], in which case interest resides
in estimating the parameters given the data. The disadvantage of such estimation is that
it eliminates explicit estimation of the underlying true latent process Y. Instead, the distri-
bution for Y is implicitly included through the rst and second moments as a result of the
integration.