6
State Space Models for Count Time Series
Richard A. Davis and William T.M. Dunsmuir
CONTENTS
6.1 Introduction...................................................................................121
6.2 Approaches to Estimation. .. .. .. ... .. .. .. ... .. .. ..... .. .. ... .. .. .. ... .. .. .. ... .. .. ..... .. .. .126
6.2.1 Estimating Equations................................................................127
6.2.2 Likelihood-based Methods..........................................................128
6.2.3 Earlier Monte Carlo and Approximate Methods... . . .. . . ... . ... . . .. . . ... . . .. . . .129
6.2.4 Laplace and Gaussian Approximations...........................................130
6.2.5 Importance Sampling................................................................133
6.2.5.1 Importance SamplingbasedonLaplace Approximation............133
6.2.5.2 Importance Sampling basedon Gaussian Approximation...........134
6.2.5.3 EfcientImportance Sampling...........................................134
6.2.6 Composite Likelihood...............................................................135
6.3 Applications to Analysis of Polio Data....................................................138
6.3.1 Estimate of Trend Coefcient β
ˆ
2
...................................................139
6.3.2 EstimateofLatent Process Parameters............................................140
6.3.3 Comparisons of Computational Speed............................................140
6.3.4 Some Recommendations. . .. ... .. . .. ... .. . .. ... .. . .. ... .. . .. ... .. . .. . .. ... .. . .. ... ..140
6.4 Forecasting.....................................................................................141
References............................................................................................143
6.1 Introduction
The family of linear state-space models (SSM), which have been a staple in the time series
literature for the past 70 plus years, provides a exible modeling framework that is appli-
cable to a wide range of time series. The popularity of these models stems in large part
from the development of the Kalman recursions, which provides a quick updating scheme
for predicting, ltering, and smoothing a time series. In addition, many of the commonly
used time series models, such as univariate and multivariate ARMA and ARIMA processes
can be embedded in an SSM, and as such can take advantage of fast recursive calculation
related to prediction and ltering afforded by the Kalman recursions. Recent accounts of
linear state space models can be found in Brockwell and Davis (1991), Brockwell and Davis
(2002), Durbin and Koopman (2012), and Shumway and Stoffer (2011).
121
122 Handbook of Discrete-Valued Time Series
SSMs can be described via two equations, the state-equation, which describes the evolu-
tion of the state, S
t
, of the system, and an observation equation that describes the relationship
between the observed value Y
t
of the time series with the state variable S
t
. In this chap-
ter, we shall assume that the state process evolves according to its own probabilistic
mechanism; that is, the conditional distribution of S
t
given all past states and past obser-
vations only depends on the past states. More formally, the conditional distribution of S
t
given S
t1
, S
t2
, ..., Y
t1
, Y
t2
, ... is the same as the conditional distribution of S
t
given
S
t1
, S
t2
, .... The case in which the conditional distribution depends on previous observa-
tions is considered in other chapters in this volume (see Tjøstheim [2015; Chapter 4 in this
volume]). See also Brockwell and Davis (2002) for a treatment of this more general case.
For the linear SSM specication, the observation equation and its companion state
equation that govern the univariate observation process {Y
t
} and time evolution of the
s-dimensional state-variables {S
t
}are specied via a linear relationship. One of the simplest
linear SSM specications is
Observation equation: Y
t
= GS
t
+ V
t
, t = 1, 2, ..., {V
t
}∼IID(0, σ
2
),
State equation: S
t+1
= FS
t
+ U
t
, t = 1, 2, ..., {U
t
}∼IID(0, R),
(6.1)
where
IID(0, S) denotes an independent and identically distributed sequence of random
variables (vectors) with mean 0 and variance (covariance matrix) S,
G is a 1 × s dimensional matrix,
F is an s × s dimensional matrix, and
The sequence {Y
1
, (U
t
, V
t
), t =1, 2, ...} is IID where U
t
and V
t
are allowed to be
dependent.
Often, one assumes that the matrix F in the state equation has eigenvalues that are less
than one in absolute value, in which case S
t
is viewed as a causal vector AR(1) process (see
Brockwell and Davis, 1991). In the case when the noise terms {U
t
} and {V
t
} are Gaussian,
then this state-space system is referred to as a Gaussian SSM.
It is well-known (see, e.g., Proposition 2.1 in Davis and Rosenblatt [1991]) that except
in the degenerate case, the marginal distribution of a univariate autoregressive process
with iid noise is continuous. So if one considers a stationary solution of (6.1) in the one
dimensional case, S
t
has a continuous distribution. Consequently, the distribution of the
Y
t
in (6.1) cannot have a discrete component. As such, this precludes the use of the linear
SSM as specied in (6.1) for modeling count time series.
Nonlinear SSMs possess a similar structure as the linear state space model. The lin-
ear equations are now replaced by specifying the appropriate conditional distributions.
Writing Y
(t)
=(Y
1
, ..., Y
t
)
and S
(t)
=(S
1
, ..., S
t
)
, the observation and state equations
become:
Observation equation:
p y
t
|s
t
= p y
t
|s
t
, s
(t1)
, y
(t1)
, t = 1, 2, ... (6.2)
State equation:
p
(
s
t+1
|s
t
)
= p s
t+1
|s
t
, s
(t1)
, y
(t)
, t = 1, 2, ..., (6.3)
123 State Space Models for Count Time Series
where p(y
t
|s
t
) and p(s
t+1
|s
t
) are prespecied probability density functions
. The joint den-
sity for the observations and state components can be rewritten (Brockwell and Davis, 2002,
Section 8.8) as
n
p(y
(n)
, s
(n)
) = p(s
j
|s
j1
)
p
1
(s
1
),
j=2
where p
1
(·) is a pdf for the initial state vector S
1
. Using the fact that {S
t
} is a Markov process,
it follows that
n
p(y
(n)
|s
(n)
) = p(y
j
|s
j
). (6.4)
j=1
Even though the nonlinear state-space formulation given in (6.2) and (6.3) is quite gen-
eral, the specication can be limiting in that the state process is required to be Markov. An
alternative specication of a general SSM can be fashioned using (6.4) as a starting point.
That is, given values for the state process S
1
, ..., S
n
, the random variables Y
1
, ..., Y
n
are
assumed to be conditionally independent in the sense of (6.4). Using (6.4) and any joint
density p(s
(n)
) for S
(n)
, the joint density of Y
(n)
is then
n
p(y
(n)
) = p(y
j
|s
j
)p(s
(n)
)ds
(n)
. (6.5)
R
n
j=1
So in general, the distribution of the state process can be freely specied and, in particular,
the state process need not be Markov.
For count time series, the observation probability density function must have support on
asubsetof0,1,2,...and is generally chosen to be one of the common discrete distribution
functions (i.e., Poisson, negative-binomial, Bernoulli, geometric, etc.), where the parameter
of the discrete distribution is a function of the state S
t
. One common choice is to assume that
the conditional distribution of Y
t
given S
t
is a member of a one-dimensional exponential
family with canonical parameter s
t
, i.e.,
p(y
t
|s
t
) = exp{s
t
y
t
b(s
t
) + c(y
t
)}, y = 0, 1, ..., (6.6)
where b(·), c(·) are known real functions and s
t
is the canonical parameter (see McCullagh
and Nelder, 1989). Recall that for an exponential family as specied in (6.6), we have the
following properties:
E(Y
t
|S
t
= s
t
) = b
(s
t
)
var(Y
t
|S
t
= s
t
) = b

(s
t
),
where b
and b

refer to the rst and second derivatives. The canonical link function g for
an exponential family model maps the mean function into the canonical parameter, that is,
These are probability density functions relative to some dominating measure, which is usually taken to be
counting measure on {0, 1, ...} for p(y
t
|s
t
) and Lebesgue measure for p(s
t
|s
t1
).

124 Handbook of Discrete-Valued Time Series
g(μ) = (b
)
(1)
(μ). In other words, the link function is the inverse of the conditional mean,
which provides the mapping that expresses the state as a function of the conditional mean.
Examples:
1. Poisson: p(y
t
|s
t
) = exp{s
t
y
t
e
s
t
log(y
t
!)}, y
t
= 0, 1, .... In this case E(Y
t
|s
t
) = exp{s
t
}
and hence the link function is g(μ) = log μ.
2. Binomial: Assuming the number of trials in the binomial at time t is m
t
, which
is known but could vary with t, then p(y
t
|s
t
) = exp s
t
y
t
m
t
log(1 + e
s
t
)


log
m
y
t
t
, y
t
= 0, 1, ... , m
t
, where the probability of success on a single trial is
p
t
= e
s
t
/(1 + e
s
t
). In this case, b
t
(s
t
) = m
t
log(1 + e
s
t
), E(Y
t
|S
t
= s
t
) = m
t
e
s
t
/(1 + e
s
t
)
and the link is the logit function, g(p
t
) = logit(p
t
) = log(p
t
/(1 p
t
)).
3. Negative-binomial: The NegBin(r, p) distribution has density function,
y
t
+ r 1
p(y; r, p) = exp log(1 p)y
t
+ r log p + log
, y
t
= 0, 1, 2, ...
r 1
with mean r(1 p)/p. If we use the natural parameterization where s
t
represents
the canonical parameter, that is, s
t
= log(1 p
t
), then we see nd that s
t
< 0. From
a modeling perspective, this may be too restrictive. Allowing S
t
to be a linear time
series model that takes both positive and negative values permits greater exibil-
ity in terms of the behavior of S
t
. So instead of using the canonical link function
g(μ) =re
s
t
/(1 + e
s
t
), consider the alternative link function g
˜
(μ) =− log(r/μ) =
log(p/(1 p)) =−logit(p). Setting s
t
=g
˜
(μ
t
) =−logit(p
t
),weobtain p
t
=e
s
t
/
(1 + e
s
t
) so that the conditional density of Y
t
given S
t
= s
t
becomes
p(y
t
|s
t
) = exp
log(1 + e
s
t
)y
t
+ r log(e
s
t
/(1 + e
s
t
)) + log
y
t
+ r 1

,
r 1
y
t
= 0, 1, 2, ..., with conditional mean E(Y
t
|s
t
) = re
s
t
/(1 + e
s
t
).
To complete the model specication for the count time series after the observation den-
sity p(y
t
|s
t
) has been chosen, it remains to describe the distribution of the state process
{S
t
}. Often the primary objective in modeling count time series is to describe associations
between observations and a suite of covariates. The covariates may be functions of time
(e.g., day of the week effect) or exogenous (e.g., climatic effects). Since the Y
t
s are assumed
to be conditionally independent given the state process, the mean structure and the depen-
dence in the time series can be modeled entirely through the state process. A natural choice
is to use a regression with time series errors model for S
t
.Ifx
t
= (1, x
1
... , x
p
)
T
denotes the vec-
tor of covariates associated with the t
th
observation
, then a regression time series model
for S
t
is given by
S
t
= x
t
T
β + α
t
, (6.7)
One can use other choices for the link function besides the canonical one; as in the negative binomial example.
See McCullagh and Nelder (1989) for more details.
Our covariates always include an intercept term.
125 State Space Models for Count Time Series
where β = (β
0
, ..., β
p
)
T
is the vector of regression parameters and {α
t
} is a strictly station-
ary time series with zero mean. Sometimes the {α
t
} process or {S
t
} itself is referred to as a
latent process since it is not directly observed. Usually, but not always, one takes {α
t
} to be
a strictly stationary Gaussian time series for which there is an explicit expression for the
joint distribution of S
(n)
= (S
1
, ..., S
n
)
T
. In this case, writing
n
= cov(S
(n)
, S
(n)
), the joint
density of Y
(n)
is given by
n
2
1
(s
(n)
Xβ)
T
1
(s
(n)
Xβ)
ds
(n)
e
n
p(y
(n)
) = p(y
t
|x
t
β + α
t
)
(2π)
n/2
|
n
|
1/2
, (6.8)
R
n
t=1
where X = (x
1
, ..., x
n
)
T
is the design matrix and α
(n)
= (α
1
, ..., α
n
)
T
. For estimation
purposes, it is convenient to express (6.8) as a likelihood function and writing it in the
form
n
L(θ) =
p(y
t
|x
t
β + α
t
)
(2π)
n/2
1
|
n
|
1/2
e
1
2
(α
(n)
)
T
n
1
α
(n)
dα
(n)
, (6.9)
R
n
t=1
where the covariance matrix
n
=
n
(ψ) depends on the parameter vector ψ and θ
T
=
β
T
, ψ
T
denotes the complete parameter vector.
The SSM framework of (6.2) in which the conditional distribution in the observation
equation is given by a known family of discrete distributions, such as the Poisson, has a
number of desirable features. First, the setup is virtually identical to the starting point of
a Bayesian hierarchical model. Conditional on a state-process, which in the Poisson case
might be the intensity process, the observations are assumed to be conditionally inde-
pendent and Poisson distributed. Second, the serial dependence is then modeled entirely
through the state-equation. This represents a pleasing physical description for count data,
which in the Poisson case, ts under the umbrella of Cox-processes or doubly stochastic
Poisson processes. As in most Bayesian modeling settings, the unconditional distribution
of the observation, obtained by integrating out the state-variable, rarely has an explicit
form. Except in a limited number of cases, it is rare that the unconditional distribution
of Y
t
is of primary interest. The modeling emphasis in the SSM specication is on choice
of conditional distribution in the observation equation and the model for the state process
{S
t
}. An overview of tests of the existence of a latent process and estimates of its underlying
correlation structure can be found in Davis et al. (1999).
The autocorrelation function (ACF) is the workhorse for describing dependence and
model tting for continuous response data using linear time series models. For nonlin-
ear time series models, including count time series, the ACF plays a more limited role.
For example, for nancial time series, where now Y
t
represents the daily log-returns of
an asset or exchange rate on day t, the data are typically uncorrelated and the ACF of
the data is not particularly useful. On the other hand, the ACF of the absolute values and
squares of the time series {Y
t
} can be quite useful in describing other types of serial depen-
dence. For time series of counts following the SSM model described above, the ACF can
also be used as a measure of dependence but is not always useful. In some cases, the ACF
of {Y
t
} can be expressed explicitly in terms of the ACF of the state-process. For example,
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset