392 Handbook of Discrete-Valued Time Series
equation. It is natural to consider these components as a mechanistic description of the
system behavior, and as an observational process. Cressie and Wikle (2011) have presented
a thorough exposition of such a modeling paradigm where their data model is the mea-
surement equation and their process model is their system equation. From the standpoint
of space-time count data modeling in geographic health studies, it is convenient to rst
reparameterize the process model to focus on a transformed variable, see Holan and Wikle
(2015; Chapter 15 in this volume). The non-Gaussian nature of the data model can be
avoided by a transformation where z
it
= log[(y
it
+ e
p
)/(e
it
+ e
p
)], e
p
being a small posi-
tive constant. This transforms the observations into an empirical log relative risk. This is a
close-to-Gaussian form for most small area disease incidence.
The system/process model is now
θ
it
|θ
t1
, N(f (θ
i,t1
), )
tk
= cov(θ
it
, θ
ik
)
with the observational/data process specied as
z
it
|μ
it
,
t
N(μ
it
,
t
)
exp(μ
it
) = θ
it
il
t
= cov(z
it
, z
lt
)
where
t
is dened to be a positive denite spatial covariance matrix. Usually, this covari-
ance would be thought to be constant in time and so
il
= cov(z
it
, z
lt
) will be constant t.
However, it is possible to generalize the covariance to include temporal dependence. Note
that the covariance of the risks is dened for time only. This too could be extended to
include spatial dependence.
18.3 Model Fitting Issues
Model tting for space-time small area count models has mainly focussed on Bayesian algo-
rithms that access features of the posterior distribution of parameters of interest. While it is
feasible to consider likelihood-based or pseudo-likelihood approaches to these models, it is
now simpler and computationally convenient to use sampling or posterior approximation
approaches.
18.3.1 Posterior Sampling
Once a model is specied, it is usually convenient to consider a hierarchical framework
within which parameter conditioning occurs. Conditional distributions of parameters
within a hierarchy can lead naturally to a Bayesian approach. In that case, we spec-
ify the posterior distribution of parameters given data as p(θ|y) l(y|θ)p(θ).Often in
393 Spatio-Temporal Modeling for Small Area Health Analysis
spatio-temporal models, it is difcult to obtain summaries of quantities from p(θ|y).The
usual approach then is to employ a posterior sampling algorithm. This sampling algorithm
will generate samples from the distribution in question and we can then use the samples to
approximate posterior quantities, such as means, medians, quartiles, or quantiles. MCMC
is often employed to generate such samples (Robert and Casella, 2005; Brooks et al., 2011).
This consists of an iterative algorithm whereby new parameter values are generated from
previously sampled values and which, after sufcient run time, approximates samples
from the correct posterior distribution. The software package WinBUGS and more recent
OpenBUGS have been developed to accommodate a range of MCMC sampling techniques.
For the spatio-temporal examples discussed earlier, a wide range of code is available. The
site http://academicdepartments.musc.edu/phs/research/lawson/data.htm/, (accessed
April 22, 2015.) contains a variety of examples of spatio-temporal models which can be
tted using WinBUGS or OpenBUGS.
18.3.2 INLA
A recent development in the use of approximations to Bayesian models has been proposed
by Rue et al. (2009). The basic idea is that a wide range of models that have a latent Gaussian
structure can be approximated via integrated nested Laplace approximation (INLA). These
approximations can be seen as successive approximations of functions within integrals.
The integrals are then approximated by xed integration schemes. This approximation
approach is now available in R (package R-inla: www.r-inla.org). The INLA website con-
tains many examples of the use of this approximation package, including spatial analyses.
INLA provides a fast and reasonably accurate alternative approach to MCMC for posterior
parameter estimation. It is particularly useful for large datasets (m > 10,000, say) where con-
ventional sampling programs would be extremely slow. The main advantages of INLAin its
current form are as follows: fast computation, exible model specication, and application
to log-linear Gaussian models. The main disadvantages are (currently) that it cannot han-
dle certain types of missing data, certain types of measurement error or mixtures, certain
models not expressible in log-linear form, and has a limited range of prior distributions. For
applications to spatio-temporal health data, refer to Schrödle and Held (2011), Blangiardo
et al. (2013), and Lawson (2013), Appendix D.
18.4 Advanced Modeling for Special Topics
18.4.1 Latent Components
It is possible to extend space-time models to consider the inclusion of latent components
in either space or time dimensions. While the random effect models of Section 18.2 allow
for some random variation, they do not allow for unobserved latent structure.
For example, we could conceive that a range of temporal (latent) proles underlie the
incidence in any area. These latent proles are unobserved but we would like to estimate
them if possible. This type of model can be thought of as spatial clustering of temporal
proles, so that some areas have different temporal proles from others. In essence, this
is a form of disaggregation of risk by categorizing groups of areas with similar temporal
variation of risk. One such model could be dened as
394 Handbook of Discrete-Valued Time Series
y
it
Po(e
it
θ
it
) (18.4)
log(θ
it
)|ψ
it
= α
0
+
l
w
il
ψ
lt
, (18.5)
where, for each small area i, the weights satisfy two conditions, 0 < w
il
1and
l
w
il
= 1.
The latent components ψ
l
are indexed in time and there are l = 1, ..., L unobserved com-
ponents. In this formulation, each area has a set of probabilistic weights assigned to any
given temporal component and so can be regarded as “voting” for a component in an area.
Prior distributions for the components in this model are important for identiability, and
usually, a correlated prior distribution is assumed for ψ
l
. For example, a rst-order random
walk prior distribution is often assumed
ψ
lt
|ψ
l,t1
, τ
1
N(ψ
l,t1
, τ
1
), l = 1, 2, ..., L
ψ
l
ψ
l
Alternatively, an AR(1) prior distribution could be assumed. A variety of choices are
available for prior distributions for the weights. These could be spatially correlated or
uncorrelated. A common choice is to assume that the vector w
i
= (w
i1
, w
i2
, ..., w
iL
)
has a
singular multinomial distribution of the form
p
il
Ga(1, 1)
p
il
p
il
=
k
p
ik
w
i
|p
i1
, p
i2
, ..., p
iL
Mult(1, (p
i1
, p
i2
, ..., p
iL
)).
This leads to a hard classication of the area weight. A soft classication can also be
dened using
α
i
|
α
MCAR(
α
)
w
il
|α
il
, τ
α
LN(α
il
, τ
α
)
w
il
w
il
=
k
w
ik
where MCAR denotes a multivariate CAR prior distribution, which admits correlation
between spatially correlated elds (Gelfand and Vounatsou, 2003). The covariance matrix
α
can have a Wishart prior distribution. A fuller discussion and evaluation of these
models can be found in Lawson et al. (2010) and Choi and Lawson (2011).
18.4.2 Infectious Diseases
In recent years, there has been rapid progress in developing statistical models for under-
standing and controlling the spread of infectious diseases, which remain a leading cause of
morbidity and mortality worldwide. Unlike the analysis of noninfectious diseases, mod-
els describing infectious disease dynamics must take into account the transmissible nature
395 Spatio-Temporal Modeling for Small Area Health Analysis
of infections. The traditional approach to model the progress of an epidemic include the
so-called compartmental models (Keeling and Rohani, 2008; Vynnycky and White, 2010).
Within this class of models, the SIR model straties the population into three subgroups:
those who are susceptible to being infected, those who are infected, and those who are
immune. The discrete-time model describes the progression of the infection through the
number of individuals in each compartment at discrete time steps. The following differ-
ence equations determine the number of individuals in different categories at a particular
time period t
S
t
= S
t1
βI
t1
S
t1
,
I
t
= I
t1
+ βI
t1
S
t1
rI
t1
,
R
t
= R
t1
+ rI
t1
,
where the disease transmission rate β represents the rate at which two individuals come
into effective contact (a contact that will lead to infection). Here, the transmission rate is
assumed to be constant, but it can be allowed to vary in time. The parameter r represents
the proportion of infected who recover and become immune. Based on the nature of the
infection, alternative compartmental models, such as the Susceptible-Infected-Susceptible
(SIS), Susceptible-Infected-Recovered-Susceptible (SIRS), Susceptible-Exposed-Infected-
Recovered (SEIR), or Susceptible-Exposed-Infected-Recovered-Susceptible (SEIRS) mod-
els, can also be used.
Morton and Finkenstädt (2005) proposed a stochastic version of the discrete-time SIR
model and showed its Bayesian analysis. An extension of that model to the spatial domain
was proposed by Lawson and Song (2010), where a neighborhood infection effect was incor-
porated into the model specication to account for spatial transmission. Hooten et al. (2010)
showed the application of an SIRS model to state-level inuenza-like illness (ILI) data.
Ideally, spatio-temporal modeling of infectious diseases would be done at individual
level (Lawson and Leimich, 2000; Deardon et al., 2010). By tracking the status of every
individual in a population, these models provide an accurate description of the spread of
epidemics through time and space. In addition, they allow for heterogeneity in the popu-
lation via individual-level covariates. However, information about individual movement
and contact behavior is scarcely ever available. In practice, only partial information about
the total number of infected individuals in each small area and time period is available.
For aggregated counts within small areas and time periods, it is also common to assume
a Poisson data-level model. Hierarchical Poisson models may be appropriate when the
number of susceptibles is unknown and disease counts are small relative to the population
size. One approach within this scenario is to assume that counts of disease y
it
are Poisson
distributed with mean λ
it
= e
it
θ
it
, where e
it
is the number of cases expected during nonepi-
demic conditions and θ
it
is the relative risk in area i and time period t, i = 1, ..., m and
t = 1, ..., T. Mugglin et al. (2002) described the evolution of epidemics through changes in
the relative risks of disease, which are dened by a vector autoregressive model. Once the
change points have been chosen, stability, growth, and recession of infection are described
by modifying the mean of the innovation term in the autoregressive process. Knorr-Held
and Richardson (2003) modeled the log of the relative risks through latent spatial and tem-
poral components. An extra term that is a function of the previous number of cases is
incorporated into the relative risk model during epidemic periods, which are differenti-
ated through latent binary indicators, to explain the increase in incidence. An alternative
396 Handbook of Discrete-Valued Time Series
approach, which is motivated from a branching process model with immigration, was
proposed by Held et al. (2005). In that model, disease incidence is separated into two
components as follows:
λ
it
= ν
it
+ γy
i,t1
+ φ y
j,t1
ji
where the endemic component ν
it
relates disease incidence to latent parameters describing
endemic seasonal patterns and the notation i j denotes that i is a neighbor of j. The epi-
demic component, which is modeled with an autoregression on the previous numbers of
cases, captures occasional outbreaks beyond seasonal epidemics. Extensions of this model
can be found in Held et al. (2006) and Paul et al. (2008).
18.5 Prospective Analysis and Disease Surveillance
Most of the models described in the previous sections have been developed for retrospec-
tive analyses of disease maps. However, there are situations where real-time modeling and
prediction play a crucial part. This is the case, for instance, of public health surveillance,
which is dened as (Thacker and Berkelman, 1992)
the ongoing, systematic collection, analysis, and interpretation of health data essential
to the planning, implementation, and evaluation of public health practice, closely inte-
grated with the timely dissemination of these data to those who need to know. The nal
link of the surveillance chain is the application of these data to prevention and control.
Hence, sequential analyses of all the data collected so far are a key concept to early detec-
tion of changes in disease incidence and, consequently, to facilitate timely public health
response.
Most work on surveillance methodology has evolved in temporal applications, and so
a wide range of methods including process control charts, temporal scan statistics, regres-
sion techniques, and time series methods have been proposed to monitor univariate time
series of counts of disease (Sonesson and Bock, 2003; Unkel et al., 2012). Regression mod-
els have been widely used for outbreak detection. For instance, the log-linear regression
model of Farrington et al. (1996) is used by the Health Protection Agency to detect aberra-
tions in laboratory-based surveillance data in England and Wales. At each time point, the
observed count of disease is declared aberrant if it lies above a threshold, which is com-
puted from the estimated model using a set of recent observations with similar conditions.
Within the time series scenario, hidden Markov models have proved to be successful in
monitoring epidemiological data. The basic idea of these models is to segment the time
series of disease counts into epidemic and nonepidemic phases (Le Strat and Carrat, 1999).
Martínez-Beneito et al. (2008) used a hidden Markov model to detect the onset of inuenza
epidemics. Unlike previous hidden Markov models, the authors modeled the series of dif-
ferenced rates rather than the series of incidence rates. More recently, Conesa et al. (2015)
have proposed an enhanced modeling framework that incorporates the magnitude of the
incidence to better distinguish between epidemic and nonepidemic phases.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset