354 Handbook of Discrete-Valued Time Series
terminology, we might use the term “node” instead of agent, where a similar setting involv-
ing a virus could spread dynamically through a computer network, with servers acting as
agents in that case.
Desirable features of an ABM that might be used to model the rabies epidemic are (1) the
exibility to allow for long-distance dispersal events as well as (2) a more typical diffusion-
type process that accounts for the smoother, yet heterogeneous, spread of disease from
neighbor to neighbor. Thus, consider the statistical data model for the rabies observations
y
i,t
collected at grid cells i = 1, ..., n for times t = 1, ..., T
y
i,t
Bern(θ
i,t
), (16.1)
where the presence probabilities θ
i,t
are assumed to control the observed process and are
modeled as
φ, y
i,t1
= 1
θ
i,t
=
p
¯
i,t
, (1 y
i,t1
)(I
N
i,t1
) = 1
(16.2)
ψ, (1 I
N
i,t1
) = 1,
such that they depend on a set of indicator variables that identify when a given cell can
be inuenced by the neighboring cells (i.e., cells in the set N
i,t1
, in this case the rst- and
second-order grid neighbors) that happen to be active at the previous time
I
N
i,t1
=
1,
jN
i
y
j,t1
> 0
(16.3)
0,
jN
i
y
j,t1
= 0.
Thus, the data depend on parameters θ
i,t
which can take one of three forms (16.2): (1)
persistence (φ), (2) short-distance dispersal (
p
¯
i,t
), or (3) long-distance dispersal (ψ). If the
processes were fully persistent and not capable of long-distance dispersal, then only the
middle term would be necessary in (16.2). It is this near-distance dispersal process (
p
¯
i,t
)
that is the real workhorse of the model, allowing for “intelligent” spreading dynamics
to occur. There are a number of ways one could specify the dynamics for the near-
distance disperal probabilities
p
¯
i,t
. We take a similar approach as described by Hooten and
Wikle (2010) where, given the disease status for the neighborhood at the previous time
y
N
i
,t1
≡{y
j,t1
: s
j
N
i
} and an additional set of neighborhood interaction probabilities
p (p
1
, p
2
, ..., p
|N
i
|
), we have the relationship
p
¯
i,t
= 1 exp((y
N
i
,t1
)
log(1 p)). (16.4)
This process model (16.4) essentially implies that the probability of disease presence in
a given cell is a function of the incoming disease from neighboring cells at the previous
time. Specically, the probability of presence at time t in a previously unoccupied area
is the union of the transition probabilities from its occupied neighbors. If we assume the
interaction probabilities sum to 1 then a natural stochastic model for p is
p Dirichlet(a), (16.5)
where the hyperparameters a could be chosen to represent any prior knowledge pertain-
ing to a bias in dispersal direction. Alternatively, both p and a could be allowed to vary
either spatially or temporally. Hooten and Wikle (2010) describe a model that allows for
Hierarchical Agent-Based Spatio-Temporal Dynamic Models for Discrete-Valued Data 355
spatial heterogeneity in the interaction probabilities p that correlate with the gradient of a
potential eld. To briey describe how this generalization could be implemented, consider
a potential eld α(X) on the grid that depends on a set of variables X inuencing the spread
of disease. One possible hypothesis for disease spread might be that it responds to changes
in the potential eld α; for example, an abrupt change in a landscape feature such as a
boundary between land and water would lead to an increase in the gradient perpendicu-
lar to the direction of the boundary. On a discrete spatial support, such as the one we are
considering here for the rabies example, the rst-order gradient could be summarized as a
set of velocities a
i
, one in each direction from the cell of interest i to its neighboring cells in
N
i
. These gradient values a
i
(or some function of them) can then serve as hyperparameters
in the Dirichlet model for the spatially varying interaction probabilities p
i
. Of course, the
Dirichlet is only one way to model p
i
; other stochastic models or deterministic functions
could be used to link p
i
to a
i
,or X.
Returning now to the model described in (16.1–16.5), we can express the posterior dis-
tribution for all unknowns as proportional to the conditionally factored joint distribution
such that
T
m
[p, a, φ, ψ|{y
i,t
, i, t}] [y
i,t
|θ
i,t
(p, φ, ψ)][p|a][a][φ][ψ] . (16.6)
t=1
i=1
Also, recall that the presence probabilities θ
i,t
are a function of the other model parameters;
thus, inference can be obtained for them as derived quantities in the model. For exam-
ple, the posterior mean and standard deviation of the space–time series for θ
i,t
are shown
in Figure 16.2. The left panel of Figure 16.2 shows the posterior mean for θ
i,t
and gives
us a quantitative understanding of presence probability along the front of the epidemic
over time. Similarly, the posterior standard deviation for θ
i,t
shown in the right panel of
Figure 16.2 allows us to visualize the uncertainty pertaining to presence probability for
all grid cells and times. Such statistical products could be used for short-term forecasting;
for example, forecasting the presence for the next year. Such forecasts require a reasonable
timescale to accommodate the required computation time.
Overall, the relatively simple statistical ABM presented in (16.1–16.5) represents a funda-
mentally different approach to modeling dynamics (i.e., bottom-up rather than top-down)
that is quite general and capable of representing complicated dynamics based on only a
simple set of rules describing the behavior among agents. As a reminder, in this example
we let the grid cells act as agents, but there is no reason why individual raccoons could not
serve as agents being explicitly modeled if sufcient individual-level data were collected
to gain the desired statistical inference. Scale is a critical component of all models, but it
seems especially important in statistical ABMs because we seek to invert the models and
are thus limited by the scale on which the data were collected.
16.4 Hierarchical First-Order Emulators and ABMs
Parameter estimation, calibration, and validation can be difcult in deterministic and
stochastic ABMs given the computational cost of simulation and, in most cases, the
nonlinear relationships in the parameters. The use of statistical emulators (or surrogates)
356 Handbook of Discrete-Valued Time Series
(a) (b)
FIGURE 16.2
Posterior mean (a) and standard deviation (b) for the presence probabilities (θ
i,t
) for raccoon rabies in Connecticut,
USA, during 1991–1995. Time for each image increases from top to bottom and left to right in that order. Left panel
values range from zero (white) to one (black); right panel values range from zero (white) to
0.25 (black).
for complicated computer simulation models have, in recent years, proven useful to
address these issues (e.g., Currin et al., 1991; Kennedy and O’Hagan, 2001; Sacks et al.,
1989). Emulators act as a fast approximation to the computer simulation model, and
because they are statistical models, are ideal to include within a Bayesian framework to
perform uncertainty analysis, sensitivity analysis (OHagan, 2006), and model calibration
and prediction (e.g., Higdon et al., 2008, 2004). That is, they allow a fairly high-delity rep-
resentation of the simulation model but at a fraction of the computational cost. Statistical
emulators have most often been implemented through the use of second-order (covariance)
Gaussian process model specications similar to geostatistical modeling (e.g., Kennedy and
O’Hagan, 2001; OHagan, 2006; Sacks et al., 1989). Second-order emulators have recently
been used to assess parameter sensitivity and estimation in ABMs (e.g., Dancik et al., 2010;
Parry et al., 2013). As described in Hooten et al. (2011), it is often desirable to model the
input–output relationship for a mechanistic model by using rst-order characteristics (e.g.,
Frolov et al., 2009; Leeds et al., 2013, 2014; van der Merwe et al., 2007). In what follows, we
provide a simple example to illustrate the use of rst-order emulators to perform parameter
estimation in the ABM context.
Assume we have a limited number of runs of the ABM simulation at input settings
θ
(1)
, ..., θ
(N)
and associated output y
(1)
, ..., y
(N)
, where y
(i)
is a T-dimensional vector.
Here, “input settings” may refer to model parameters, forcings, or even past values of
the process state. We then seek a surrogate statistical model that predicts y
at (untried)
Hierarchical Agent-Based Spatio-Temporal Dynamic Models for Discrete-Valued Data 357
input settings θ
. Following the rst-order approach outlined in Hooten et al. (2011), we
let Y = (y
(1)
, ..., y
(N)
) and consider the singular value decomposition, Y = UDV
, where U
is a T×T matrix of left singular vectors, D is a T×N matrix with the singular values along the
principal diagonal, and V is an N × N matrix of right singular vectors. We then reduce the
dimension by truncating the singular value decomposition matrices so that only the rst q
singular vectors and singular values are considered (e.g., selecting q to account for a signi-
cant portion of the variation in the output matrix Y). We denote these truncated matrices by
U,
D,and
V,so
U
D is a T × q matrix and
V is an N × q matrix. We then seek to link the
parameters to the model output through the right singular vectors,
V. Specically, note that
the ith parameter vector θ
(i)
corresponds to the ith row of
V, which we denote v
(i)
.Thus,
the ith output vector is approximated by y
(i)
v
(i)
. We model these right singular vec-
tors in terms of the parameters such that v
(i)
= g(θ
(i)
, α), where g(·) is some function of the
input parameter vector θ
(i)
and parameter vector α. We then model y
(i)
= g(θ
(i)
, α) + η
i
,
where η
i
is an error process. Typically, this error process, which accounts for truncation
and emulator model representativeness, is assumed to be Gaussian with mean zero and
variance–covariance matrix
η
. Upon estimating the parameters α from the model simu-
lations, for a given input parameter vector θ
we can predict the model output vector by
y
. Hooten et al. (2011) consider both linear and nonlinear (random forest) functions for
g(·), and Leeds et al. (2014) consider a parametric quadratic nonlinear model for g(·) as
developed in Wikle and Hooten (2010) and Wikle and Holan (2011).
In a situation for which one has observation data corresponding to the simulation output
vector, a Bayesian approach can be applied to efciently obtain the posterior parameter
distribution for θ by using the emulator in place of the simulation model. That is, given
data vector y
D
, parameter estimates
α
ˆ
, prior distributions [θ], [τ
ν
],and[
η
], a very simple
Bayesian hierarchical model could be specied as
y
D
= y + ν, ν (τ
ν
), (16.7)
y = g(θ;
α
ˆ
) + η, η Gau(0,
η
),
[θ][τ
ν
][
η
],
where ν is an error process with a distribution that depends on parameters τ
ν
. In addition,
a non-Gaussian error distribution could be considered for η if warranted. As an alter-
native to the use of the simulation-derived parameter estimates for α, these parameters
could be assigned a prior distribution with hyperparameters specied based on the model
simulation estimates (e.g., prior mean given by
α
ˆ
) as in Leeds et al. (2014).
16.4.1 Simple Simulated Epidemic ABM Emulator Example
Consider a very simple cellular ABM to simulate an epidemic for susceptible, infected, and
recovered (SIR) agents. This type of model is also referred to as a “compartment model.”
In particular, let the state of the ith agent at time t take one of the values
0, susceptible
y
i,t
1, ..., K
I
, infected
K
I
+ 1, ..., K
I
+ K
R
, recovered.
358 Handbook of Discrete-Valued Time Series
Thus, there is one susceptible state, K
I
infected states, and K
R
recovered states. The ABM
model is then specied such that susceptible agents (state 0) can be infected with some
probability (θ
i,t
), with the odds of infection increasing depending on the number of neigh-
bors infected. Once infected, the disease is assumed to follow a deterministic course, in
which the infected individual goes through K
I
infected states followed by K
R
recovered
states. The agent cannot be reinfected while it is in the recovered state. Specically, this
hybrid deterministic/stochastic SIR model is formulated so that the agent state evolves
according to
Bern(θ
i,t
),if y
i,t1
= 0
y
i,t
=
y
i,t1
+ 1, if 0 < y
i,t1
< K
I
+ K
R
0, if y
i,t1
= K
I
+ K
R
,
where
J
N
i
,t1
π
i
θ
i,t
=
,
1 π
i
(1 J
N
i
,t1
)
and
J
N
i
,t1
1
(0<y
j,t1
<K
I
)
,
jN
i
with π
i
the prior probability of the ith agent being infected if exposed. In this case, θ
i,t
reects the probability of infection, for which the odds increase as the number of infected
neighbors increases; in other words, for one neighbor the probability of infection is π
i
,and
the odds ratio is J
N
i
,t1
(the number of infected neighbors), dened relative to the odds of
infection if one neighbor is infected.
As an illustration, we consider a simulation with a regular grid of 32 × 32 agents, and
simulate 150 times steps from a random initial SIR state (assuming a baseline probability of
infection of 0.1), assuming 4 infection states (N
I
= 4) and 4 recovered states (N
R
= 4). Thus,
y
i,t
can take discrete values {0, 1, ...,8}. We allow the prior probability of infection to then
be spatially varying according to an 8 × 8 grid of “regions” (each region contains a 4 × 4
grid of 16 agents). We generate the prior probability of infection for all cells in the jth region
according to
p
j
=
1
(β
1
x
1,j
+ β
2
x
2,j
),
where
1
is the inverse cumulative distribution function (CDF) of a standard normal
distribution (although other link functions could be considered here) and x
1,j
and x
2,j
are
assumed to be known covariates (in our case, corresponding to a sinusoidal pattern as
shown in Figure 16.3a and b). Thus, the prior probability of infection for the ith agent, π
i
,
is equal to p
j
if the ith agent is in region j. We simulate this epidemic assuming β
1
= 0.3
and β
2
= 0.1, which gives the probability of infection variation in space as shown in
Figure 16.3c. The initial state and two realizations from this ABM simulation, separated
by 10 time steps, are shown in Figure 16.3d through f. Note that the lower probability
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset