18: Spatio-Temporal Modeling for Small Area Health Analysis (2/4)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

392 Handbook of Discrete-Valued Time Series

equation. It is natural to consider these components as a mechanistic description of the

system behavior, and as an observational process. Cressie and Wikle (2011) have presented

a thorough exposition of such a modeling paradigm where their data model is the mea-

surement equation and their process model is their system equation. From the standpoint

of space-time count data modeling in geographic health studies, it is convenient to rst

reparameterize the process model to focus on a transformed variable, see Holan and Wikle

(2015; Chapter 15 in this volume). The non-Gaussian nature of the data model can be

avoided by a transformation where z

= log[(y

+ e

)/(e

+ e

)], e

being a small posi-

tive constant. This transforms the observations into an empirical log relative risk. This is a

close-to-Gaussian form for most small area disease incidence.

The system/process model is now

|θ

t−1

,  ∼ N(f (θ

i,t−1

), )



= cov(θ

, θ

)

with the observational/data process specied as

|μ

, 

∼ N(μ

, 

)

exp(μ

) = θ



= cov(z

, z

)

where 

is dened to be a positive denite spatial covariance matrix. Usually, this covari-

ance would be thought to be constant in time and so 

= cov(z

, z

) will be constant ∀t.

However, it is possible to generalize the covariance to include temporal dependence. Note

that the covariance of the risks is dened for time only. This too could be extended to

include spatial dependence.

18.3 Model Fitting Issues

Model tting for space-time small area count models has mainly focussed on Bayesian algo-

rithms that access features of the posterior distribution of parameters of interest. While it is

feasible to consider likelihood-based or pseudo-likelihood approaches to these models, it is

now simpler and computationally convenient to use sampling or posterior approximation

approaches.

18.3.1 Posterior Sampling

Once a model is specied, it is usually convenient to consider a hierarchical framework

within which parameter conditioning occurs. Conditional distributions of parameters

within a hierarchy can lead naturally to a Bayesian approach. In that case, we spec-

ify the posterior distribution of parameters given data as p(θ|y) ∝ l(y|θ)p(θ).Often in

393 Spatio-Temporal Modeling for Small Area Health Analysis

spatio-temporal models, it is difcult to obtain summaries of quantities from p(θ|y).The

usual approach then is to employ a posterior sampling algorithm. This sampling algorithm

will generate samples from the distribution in question and we can then use the samples to

approximate posterior quantities, such as means, medians, quartiles, or quantiles. MCMC

is often employed to generate such samples (Robert and Casella, 2005; Brooks et al., 2011).

This consists of an iterative algorithm whereby new parameter values are generated from

previously sampled values and which, after sufcient run time, approximates samples

from the correct posterior distribution. The software package WinBUGS and more recent

OpenBUGS have been developed to accommodate a range of MCMC sampling techniques.

For the spatio-temporal examples discussed earlier, a wide range of code is available. The

site http://academicdepartments.musc.edu/phs/research/lawson/data.htm/, (accessed

April 22, 2015.) contains a variety of examples of spatio-temporal models which can be

tted using WinBUGS or OpenBUGS.

18.3.2 INLA

A recent development in the use of approximations to Bayesian models has been proposed

by Rue et al. (2009). The basic idea is that a wide range of models that have a latent Gaussian

structure can be approximated via integrated nested Laplace approximation (INLA). These

approximations can be seen as successive approximations of functions within integrals.

The integrals are then approximated by xed integration schemes. This approximation

approach is now available in R (package R-inla: www.r-inla.org). The INLA website con-

tains many examples of the use of this approximation package, including spatial analyses.

INLA provides a fast and reasonably accurate alternative approach to MCMC for posterior

parameter estimation. It is particularly useful for large datasets (m > 10,000, say) where con-

ventional sampling programs would be extremely slow. The main advantages of INLAin its

current form are as follows: fast computation, exible model specication, and application

to log-linear Gaussian models. The main disadvantages are (currently) that it cannot han-

dle certain types of missing data, certain types of measurement error or mixtures, certain

models not expressible in log-linear form, and has a limited range of prior distributions. For

applications to spatio-temporal health data, refer to Schrödle and Held (2011), Blangiardo

et al. (2013), and Lawson (2013), Appendix D.

18.4 Advanced Modeling for Special Topics

18.4.1 Latent Components

It is possible to extend space-time models to consider the inclusion of latent components

in either space or time dimensions. While the random effect models of Section 18.2 allow

for some random variation, they do not allow for unobserved latent structure.

For example, we could conceive that a range of temporal (latent) proles underlie the

incidence in any area. These latent proles are unobserved but we would like to estimate

them if possible. This type of model can be thought of as spatial clustering of temporal

proles, so that some areas have different temporal proles from others. In essence, this

is a form of disaggregation of risk by categorizing groups of areas with similar temporal

variation of risk. One such model could be dened as



394 Handbook of Discrete-Valued Time Series

∼ Po(e

) (18.4)



log(θ

)|ψ

= α

, (18.5)

where, for each small area i, the weights satisfy two conditions, 0 < w

≤ 1and

= 1.

The latent components ψ

are indexed in time and there are l = 1, ..., L unobserved com-

ponents. In this formulation, each area has a set of probabilistic weights assigned to any

given temporal component and so can be regarded as “voting” for a component in an area.

Prior distributions for the components in this model are important for identiability, and

usually, a correlated prior distribution is assumed for ψ

. For example, a rst-order random

walk prior distribution is often assumed

|ψ

l,t−1

, τ

−1

∼ N(ψ

l,t−1

, τ

−1

), ∀l = 1, 2, ..., L

Alternatively, an AR(1) prior distribution could be assumed. A variety of choices are

available for prior distributions for the weights. These could be spatially correlated or

uncorrelated. A common choice is to assume that the vector w

= (w

, w

, ..., w

)



has a

singular multinomial distribution of the form

∗

∼ Ga(1, 1)

∗



∗

, p

, ..., p

∼ Mult(1, (p

, p

, ..., p

)).

This leads to a hard classication of the area weight. A soft classication can also be

dened using

|

∼ MCAR(

)

∗

|α

, τ

∼ LN(α

, τ

)

∗



∗

where MCAR denotes a multivariate CAR prior distribution, which admits correlation

between spatially correlated elds (Gelfand and Vounatsou, 2003). The covariance matrix



can have a Wishart prior distribution. A fuller discussion and evaluation of these

models can be found in Lawson et al. (2010) and Choi and Lawson (2011).

18.4.2 Infectious Diseases

In recent years, there has been rapid progress in developing statistical models for under-

standing and controlling the spread of infectious diseases, which remain a leading cause of

morbidity and mortality worldwide. Unlike the analysis of noninfectious diseases, mod-

els describing infectious disease dynamics must take into account the transmissible nature

395 Spatio-Temporal Modeling for Small Area Health Analysis

of infections. The traditional approach to model the progress of an epidemic include the

so-called compartmental models (Keeling and Rohani, 2008; Vynnycky and White, 2010).

Within this class of models, the SIR model straties the population into three subgroups:

those who are susceptible to being infected, those who are infected, and those who are

immune. The discrete-time model describes the progression of the infection through the

number of individuals in each compartment at discrete time steps. The following differ-

ence equations determine the number of individuals in different categories at a particular

time period t

= S

t−1

− βI

t−1

= I

t−1

+ βI

t−1

− rI

t−1

= R

t−1

+ rI

t−1

where the disease transmission rate β represents the rate at which two individuals come

into effective contact (a contact that will lead to infection). Here, the transmission rate is

assumed to be constant, but it can be allowed to vary in time. The parameter r represents

the proportion of infected who recover and become immune. Based on the nature of the

infection, alternative compartmental models, such as the Susceptible-Infected-Susceptible

(SIS), Susceptible-Infected-Recovered-Susceptible (SIRS), Susceptible-Exposed-Infected-

Recovered (SEIR), or Susceptible-Exposed-Infected-Recovered-Susceptible (SEIRS) mod-

els, can also be used.

Morton and Finkenstädt (2005) proposed a stochastic version of the discrete-time SIR

model and showed its Bayesian analysis. An extension of that model to the spatial domain

was proposed by Lawson and Song (2010), where a neighborhood infection effect was incor-

porated into the model specication to account for spatial transmission. Hooten et al. (2010)

showed the application of an SIRS model to state-level inuenza-like illness (ILI) data.

Ideally, spatio-temporal modeling of infectious diseases would be done at individual

level (Lawson and Leimich, 2000; Deardon et al., 2010). By tracking the status of every

individual in a population, these models provide an accurate description of the spread of

epidemics through time and space. In addition, they allow for heterogeneity in the popu-

lation via individual-level covariates. However, information about individual movement

and contact behavior is scarcely ever available. In practice, only partial information about

the total number of infected individuals in each small area and time period is available.

For aggregated counts within small areas and time periods, it is also common to assume

a Poisson data-level model. Hierarchical Poisson models may be appropriate when the

number of susceptibles is unknown and disease counts are small relative to the population

size. One approach within this scenario is to assume that counts of disease y

are Poisson

distributed with mean λ

= e

, where e

is the number of cases expected during nonepi-

demic conditions and θ

is the relative risk in area i and time period t, i = 1, ..., m and

t = 1, ..., T. Mugglin et al. (2002) described the evolution of epidemics through changes in

the relative risks of disease, which are dened by a vector autoregressive model. Once the

change points have been chosen, stability, growth, and recession of infection are described

by modifying the mean of the innovation term in the autoregressive process. Knorr-Held

and Richardson (2003) modeled the log of the relative risks through latent spatial and tem-

poral components. An extra term that is a function of the previous number of cases is

incorporated into the relative risk model during epidemic periods, which are differenti-

ated through latent binary indicators, to explain the increase in incidence. An alternative



396 Handbook of Discrete-Valued Time Series

approach, which is motivated from a branching process model with immigration, was

proposed by Held et al. (2005). In that model, disease incidence is separated into two

components as follows:

= ν

+ γy

i,t−1

+ φ y

j,t−1

j∼i

where the endemic component ν

relates disease incidence to latent parameters describing

endemic seasonal patterns and the notation i ∼ j denotes that i is a neighbor of j. The epi-

demic component, which is modeled with an autoregression on the previous numbers of

cases, captures occasional outbreaks beyond seasonal epidemics. Extensions of this model

can be found in Held et al. (2006) and Paul et al. (2008).

18.5 Prospective Analysis and Disease Surveillance

Most of the models described in the previous sections have been developed for retrospec-

tive analyses of disease maps. However, there are situations where real-time modeling and

prediction play a crucial part. This is the case, for instance, of public health surveillance,

which is dened as (Thacker and Berkelman, 1992)

the ongoing, systematic collection, analysis, and interpretation of health data essential

to the planning, implementation, and evaluation of public health practice, closely inte-

grated with the timely dissemination of these data to those who need to know. The nal

link of the surveillance chain is the application of these data to prevention and control.

Hence, sequential analyses of all the data collected so far are a key concept to early detec-

tion of changes in disease incidence and, consequently, to facilitate timely public health

response.

Most work on surveillance methodology has evolved in temporal applications, and so

a wide range of methods including process control charts, temporal scan statistics, regres-

sion techniques, and time series methods have been proposed to monitor univariate time

series of counts of disease (Sonesson and Bock, 2003; Unkel et al., 2012). Regression mod-

els have been widely used for outbreak detection. For instance, the log-linear regression

model of Farrington et al. (1996) is used by the Health Protection Agency to detect aberra-

tions in laboratory-based surveillance data in England and Wales. At each time point, the

observed count of disease is declared aberrant if it lies above a threshold, which is com-

puted from the estimated model using a set of recent observations with similar conditions.

Within the time series scenario, hidden Markov models have proved to be successful in

monitoring epidemiological data. The basic idea of these models is to segment the time

series of disease counts into epidemic and nonepidemic phases (Le Strat and Carrat, 1999).

Martínez-Beneito et al. (2008) used a hidden Markov model to detect the onset of inuenza

epidemics. Unlike previous hidden Markov models, the authors modeled the series of dif-

ferenced rates rather than the series of incidence rates. More recently, Conesa et al. (2015)

have proposed an enhanced modeling framework that incorporates the magnitude of the

incidence to better distinguish between epidemic and nonepidemic phases.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 18: Spatio-Temporal Modeling for Small Area Health Analysis (2/4)

Create new playlist

Sign In

Sign Up

Table of Contents for
18: Spatio-Temporal Modeling for Small Area Health Analysis (2/4)