16: Hierarchical Agent-Based Spatio-Temporal Dynamic Models for Discrete-Valued Data (2/4)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google





354 Handbook of Discrete-Valued Time Series

terminology, we might use the term “node” instead of agent, where a similar setting involv-

ing a virus could spread dynamically through a computer network, with servers acting as

agents in that case.

Desirable features of an ABM that might be used to model the rabies epidemic are (1) the

exibility to allow for long-distance dispersal events as well as (2) a more typical diffusion-

type process that accounts for the smoother, yet heterogeneous, spread of disease from

neighbor to neighbor. Thus, consider the statistical data model for the rabies observations

i,t

collected at grid cells i = 1, ..., n for times t = 1, ..., T

i,t

∼ Bern(θ

i,t

), (16.1)

where the presence probabilities θ

i,t

are assumed to control the observed process and are

modeled as

⎧

⎨

φ, y

i,t−1

= 1

i,t

, (1 − y

i,t−1

)(I

i,t−1

) = 1

(16.2)

⎩

ψ, (1 − I

i,t−1

) = 1,

such that they depend on a set of indicator variables that identify when a given cell can

be inuenced by the neighboring cells (i.e., cells in the set N

i,t−1

, in this case the rst- and

second-order grid neighbors) that happen to be active at the previous time

i,t−1



j∈N

j,t−1

> 0

(16.3)

j∈N

j,t−1

= 0.

Thus, the data depend on parameters θ

i,t

which can take one of three forms (16.2): (1)

persistence (φ), (2) short-distance dispersal (

i,t

), or (3) long-distance dispersal (ψ). If the

processes were fully persistent and not capable of long-distance dispersal, then only the

middle term would be necessary in (16.2). It is this near-distance dispersal process (

i,t

)

that is the real workhorse of the model, allowing for “intelligent” spreading dynamics

to occur. There are a number of ways one could specify the dynamics for the near-

distance disperal probabilities

i,t

. We take a similar approach as described by Hooten and

Wikle (2010) where, given the disease status for the neighborhood at the previous time

,t−1

≡{y

j,t−1

: s

∈ N

} and an additional set of neighborhood interaction probabilities

p ≡ (p

, p

, ..., p

), we have the relationship

i,t

= 1 − exp((y

,t−1

)



log(1 − p)). (16.4)

This process model (16.4) essentially implies that the probability of disease presence in

a given cell is a function of the incoming disease from neighboring cells at the previous

time. Specically, the probability of presence at time t in a previously unoccupied area

is the union of the transition probabilities from its occupied neighbors. If we assume the

interaction probabilities sum to 1 then a natural stochastic model for p is

p ∼ Dirichlet(a), (16.5)

where the hyperparameters a could be chosen to represent any prior knowledge pertain-

ing to a bias in dispersal direction. Alternatively, both p and a could be allowed to vary

either spatially or temporally. Hooten and Wikle (2010) describe a model that allows for

 

Hierarchical Agent-Based Spatio-Temporal Dynamic Models for Discrete-Valued Data 355

spatial heterogeneity in the interaction probabilities p that correlate with the gradient of a

potential eld. To briey describe how this generalization could be implemented, consider

a potential eld α(X) on the grid that depends on a set of variables X inuencing the spread

of disease. One possible hypothesis for disease spread might be that it responds to changes

in the potential eld α; for example, an abrupt change in a landscape feature such as a

boundary between land and water would lead to an increase in the gradient perpendicu-

lar to the direction of the boundary. On a discrete spatial support, such as the one we are

considering here for the rabies example, the rst-order gradient could be summarized as a

set of velocities a

, one in each direction from the cell of interest i to its neighboring cells in

. These gradient values a

(or some function of them) can then serve as hyperparameters

in the Dirichlet model for the spatially varying interaction probabilities p

. Of course, the

Dirichlet is only one way to model p

; other stochastic models or deterministic functions

could be used to link p

to a

,or X.

Returning now to the model described in (16.1–16.5), we can express the posterior dis-

tribution for all unknowns as proportional to the conditionally factored joint distribution

such that

[p, a, φ, ψ|{y

i,t

, ∀i, t}] ∝ [y

i,t

|θ

i,t

(p, φ, ψ)][p|a][a][φ][ψ] . (16.6)

t=1

i=1

Also, recall that the presence probabilities θ

i,t

are a function of the other model parameters;

thus, inference can be obtained for them as derived quantities in the model. For exam-

ple, the posterior mean and standard deviation of the space–time series for θ

i,t

are shown

in Figure 16.2. The left panel of Figure 16.2 shows the posterior mean for θ

i,t

and gives

us a quantitative understanding of presence probability along the front of the epidemic

over time. Similarly, the posterior standard deviation for θ

i,t

shown in the right panel of

Figure 16.2 allows us to visualize the uncertainty pertaining to presence probability for

all grid cells and times. Such statistical products could be used for short-term forecasting;

for example, forecasting the presence for the next year. Such forecasts require a reasonable

timescale to accommodate the required computation time.

Overall, the relatively simple statistical ABM presented in (16.1–16.5) represents a funda-

mentally different approach to modeling dynamics (i.e., bottom-up rather than top-down)

that is quite general and capable of representing complicated dynamics based on only a

simple set of rules describing the behavior among agents. As a reminder, in this example

we let the grid cells act as agents, but there is no reason why individual raccoons could not

serve as agents being explicitly modeled if sufcient individual-level data were collected

to gain the desired statistical inference. Scale is a critical component of all models, but it

seems especially important in statistical ABMs because we seek to invert the models and

are thus limited by the scale on which the data were collected.

16.4 Hierarchical First-Order Emulators and ABMs

Parameter estimation, calibration, and validation can be difcult in deterministic and

stochastic ABMs given the computational cost of simulation and, in most cases, the

nonlinear relationships in the parameters. The use of statistical emulators (or surrogates)

356 Handbook of Discrete-Valued Time Series

(a) (b)

FIGURE 16.2

Posterior mean (a) and standard deviation (b) for the presence probabilities (θ

i,t

) for raccoon rabies in Connecticut,

USA, during 1991–1995. Time for each image increases from top to bottom and left to right in that order. Left panel

√

values range from zero (white) to one (black); right panel values range from zero (white) to

0.25 (black).

for complicated computer simulation models have, in recent years, proven useful to

address these issues (e.g., Currin et al., 1991; Kennedy and O’Hagan, 2001; Sacks et al.,

1989). Emulators act as a fast approximation to the computer simulation model, and

because they are statistical models, are ideal to include within a Bayesian framework to

perform uncertainty analysis, sensitivity analysis (OHagan, 2006), and model calibration

and prediction (e.g., Higdon et al., 2008, 2004). That is, they allow a fairly high-delity rep-

resentation of the simulation model but at a fraction of the computational cost. Statistical

emulators have most often been implemented through the use of second-order (covariance)

Gaussian process model specications similar to geostatistical modeling (e.g., Kennedy and

O’Hagan, 2001; OHagan, 2006; Sacks et al., 1989). Second-order emulators have recently

been used to assess parameter sensitivity and estimation in ABMs (e.g., Dancik et al., 2010;

Parry et al., 2013). As described in Hooten et al. (2011), it is often desirable to model the

input–output relationship for a mechanistic model by using rst-order characteristics (e.g.,

Frolov et al., 2009; Leeds et al., 2013, 2014; van der Merwe et al., 2007). In what follows, we

provide a simple example to illustrate the use of rst-order emulators to perform parameter

estimation in the ABM context.

Assume we have a limited number of runs of the ABM simulation at input settings

(1)

, ..., θ

(N)

and associated output y

(1)

, ..., y

(N)

, where y

(i)

is a T-dimensional vector.

Here, “input settings” may refer to model parameters, forcings, or even past values of

the process state. We then seek a surrogate statistical model that predicts y

∗

at (untried)



Hierarchical Agent-Based Spatio-Temporal Dynamic Models for Discrete-Valued Data 357

input settings θ

∗

. Following the rst-order approach outlined in Hooten et al. (2011), we

let Y = (y

(1)

, ..., y

(N)

) and consider the singular value decomposition, Y = UDV



, where U

is a T×T matrix of left singular vectors, D is a T×N matrix with the singular values along the

principal diagonal, and V is an N × N matrix of right singular vectors. We then reduce the

dimension by truncating the singular value decomposition matrices so that only the rst q

singular vectors and singular values are considered (e.g., selecting q to account for a signi-

cant portion of the variation in the output matrix Y). We denote these truncated matrices by

D,and

V,so ≡

D is a T × q matrix and

V is an N × q matrix. We then seek to link the

parameters to the model output through the right singular vectors,

V. Specically, note that

the ith parameter vector θ

(i)

corresponds to the ith row of

V, which we denote v

(i)

.Thus,

the ith output vector is approximated by y

(i)

≈ v

(i)

. We model these right singular vec-

tors in terms of the parameters such that v

(i)

= g(θ

(i)

, α), where g(·) is some function of the

input parameter vector θ

(i)

and parameter vector α. We then model y

(i)

= g(θ

(i)

, α) + η

where η

is an error process. Typically, this error process, which accounts for truncation

and emulator model representativeness, is assumed to be Gaussian with mean zero and

variance–covariance matrix �

. Upon estimating the parameters α from the model simu-

lations, for a given input parameter vector θ

∗

we can predict the model output vector by

∗

. Hooten et al. (2011) consider both linear and nonlinear (random forest) functions for

g(·), and Leeds et al. (2014) consider a parametric quadratic nonlinear model for g(·) as

developed in Wikle and Hooten (2010) and Wikle and Holan (2011).

In a situation for which one has observation data corresponding to the simulation output

vector, a Bayesian approach can be applied to efciently obtain the posterior parameter

distribution for θ by using the emulator in place of the simulation model. That is, given

data vector y

, parameter estimates

, prior distributions [θ], [τ

],and[�

], a very simple

Bayesian hierarchical model could be specied as

= y + ν, ν ∼ (τ

), (16.7)

y = g(θ;

) + η, η ∼ Gau(0, �

[θ][τ

][�

where ν is an error process with a distribution that depends on parameters τ

. In addition,

a non-Gaussian error distribution could be considered for η if warranted. As an alter-

native to the use of the simulation-derived parameter estimates for α, these parameters

could be assigned a prior distribution with hyperparameters specied based on the model

simulation estimates (e.g., prior mean given by

) as in Leeds et al. (2014).

16.4.1 Simple Simulated Epidemic ABM Emulator Example

Consider a very simple cellular ABM to simulate an epidemic for susceptible, infected, and

recovered (SIR) agents. This type of model is also referred to as a “compartment model.”

In particular, let the state of the ith agent at time t take one of the values

⎧

⎨

0, susceptible

i,t

∈

1, ..., K

, infected

⎩

+ 1, ..., K

+ K

, recovered.



358 Handbook of Discrete-Valued Time Series

Thus, there is one susceptible state, K

infected states, and K

recovered states. The ABM

model is then specied such that susceptible agents (state 0) can be infected with some

probability (θ

i,t

), with the odds of infection increasing depending on the number of neigh-

bors infected. Once infected, the disease is assumed to follow a deterministic course, in

which the infected individual goes through K

infected states followed by K

recovered

states. The agent cannot be reinfected while it is in the recovered state. Specically, this

hybrid deterministic/stochastic SIR model is formulated so that the agent state evolves

according to

⎧

⎨

Bern(θ

i,t

),if y

i,t−1

= 0

i,t

i,t−1

+ 1, if 0 < y

i,t−1

< K

+ K

⎩

0, if y

i,t−1

= K

+ K

where

,t−1

i,t

1 − π

(1 − J

,t−1

)

and

,t−1

≡ 1

(0<y

j,t−1

)

j∈N

with π

the prior probability of the ith agent being infected if exposed. In this case, θ

i,t

reects the probability of infection, for which the odds increase as the number of infected

neighbors increases; in other words, for one neighbor the probability of infection is π

,and

the odds ratio is J

,t−1

(the number of infected neighbors), dened relative to the odds of

infection if one neighbor is infected.

As an illustration, we consider a simulation with a regular grid of 32 × 32 agents, and

simulate 150 times steps from a random initial SIR state (assuming a baseline probability of

infection of 0.1), assuming 4 infection states (N

= 4) and 4 recovered states (N

= 4). Thus,

i,t

can take discrete values {0, 1, ...,8}. We allow the prior probability of infection to then

be spatially varying according to an 8 × 8 grid of “regions” (each region contains a 4 × 4

grid of 16 agents). We generate the prior probability of infection for all cells in the jth region

according to

= 

−1

(β

1,j

+ β

2,j

where 

−1

is the inverse cumulative distribution function (CDF) of a standard normal

distribution (although other link functions could be considered here) and x

1,j

and x

2,j

are

assumed to be known covariates (in our case, corresponding to a sinusoidal pattern as

shown in Figure 16.3a and b). Thus, the prior probability of infection for the ith agent, π

is equal to p

if the ith agent is in region j. We simulate this epidemic assuming β

= 0.3

and β

= −0.1, which gives the probability of infection variation in space as shown in

Figure 16.3c. The initial state and two realizations from this ABM simulation, separated

by 10 time steps, are shown in Figure 16.3d through f. Note that the lower probability

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 16: Hierarchical Agent-Based Spatio-Temporal Dynamic Models for Discrete-Valued Data (2/4)

Create new playlist

Sign In

Sign Up

Table of Contents for
16: Hierarchical Agent-Based Spatio-Temporal Dynamic Models for Discrete-Valued Data (2/4)