6: State Space Models for Count Time Series (1/5)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

State Space Models for Count Time Series

Richard A. Davis and William T.M. Dunsmuir

CONTENTS

6.1 Introduction...................................................................................121

6.2 Approaches to Estimation. .. .. .. ... .. .. .. ... .. .. ..... .. .. ... .. .. .. ... .. .. .. ... .. .. ..... .. .. .126

6.2.1 Estimating Equations................................................................127

6.2.2 Likelihood-based Methods..........................................................128

6.2.3 Earlier Monte Carlo and Approximate Methods... . . .. . . ... . ... . . .. . . ... . . .. . . .129

6.2.4 Laplace and Gaussian Approximations...........................................130

6.2.5 Importance Sampling................................................................133

6.2.5.1 Importance SamplingbasedonLaplace Approximation............133

6.2.5.2 Importance Sampling basedon Gaussian Approximation...........134

6.2.5.3 EfcientImportance Sampling...........................................134

6.2.6 Composite Likelihood...............................................................135

6.3 Applications to Analysis of Polio Data....................................................138

6.3.1 Estimate of Trend Coefcient β

...................................................139

6.3.2 EstimateofLatent Process Parameters............................................140

6.3.3 Comparisons of Computational Speed............................................140

6.3.4 Some Recommendations. . .. ... .. . .. ... .. . .. ... .. . .. ... .. . .. ... .. . .. . .. ... .. . .. ... ..140

6.4 Forecasting.....................................................................................141

References............................................................................................143

6.1 Introduction

The family of linear state-space models (SSM), which have been a staple in the time series

literature for the past 70 plus years, provides a exible modeling framework that is appli-

cable to a wide range of time series. The popularity of these models stems in large part

from the development of the Kalman recursions, which provides a quick updating scheme

for predicting, ltering, and smoothing a time series. In addition, many of the commonly

used time series models, such as univariate and multivariate ARMA and ARIMA processes

can be embedded in an SSM, and as such can take advantage of fast recursive calculation

related to prediction and ltering afforded by the Kalman recursions. Recent accounts of

linear state space models can be found in Brockwell and Davis (1991), Brockwell and Davis

(2002), Durbin and Koopman (2012), and Shumway and Stoffer (2011).

121













122 Handbook of Discrete-Valued Time Series

SSMs can be described via two equations, the state-equation, which describes the evolu-

tion of the state, S

, of the system, and an observation equation that describes the relationship

between the observed value Y

of the time series with the state variable S

. In this chap-

ter, we shall assume that the state process evolves according to its own probabilistic

mechanism; that is, the conditional distribution of S

given all past states and past obser-

vations only depends on the past states. More formally, the conditional distribution of S

given S

t−1

, S

t−2

, ..., Y

t−1

, Y

t−2

, ... is the same as the conditional distribution of S

given

t−1

, S

t−2

, .... The case in which the conditional distribution depends on previous observa-

tions is considered in other chapters in this volume (see Tjøstheim [2015; Chapter 4 in this

volume]). See also Brockwell and Davis (2002) for a treatment of this more general case.

For the linear SSM specication, the observation equation and its companion state

equation that govern the univariate observation process {Y

} and time evolution of the

s-dimensional state-variables {S

}are specied via a linear relationship. One of the simplest

linear SSM specications is

Observation equation: Y

= GS

+ V

, t = 1, 2, ..., {V

}∼IID(0, σ

State equation: S

t+1

= FS

+ U

, t = 1, 2, ..., {U

}∼IID(0, R),

(6.1)

where

• IID(0, S) denotes an independent and identically distributed sequence of random

variables (vectors) with mean 0 and variance (covariance matrix) S,

• G is a 1 × s dimensional matrix,

• F is an s × s dimensional matrix, and

• The sequence {Y

, (U

, V

), t =1, 2, ...} is IID where U

and V

are allowed to be

dependent.

Often, one assumes that the matrix F in the state equation has eigenvalues that are less

than one in absolute value, in which case S

is viewed as a causal vector AR(1) process (see

Brockwell and Davis, 1991). In the case when the noise terms {U

} and {V

} are Gaussian,

then this state-space system is referred to as a Gaussian SSM.

It is well-known (see, e.g., Proposition 2.1 in Davis and Rosenblatt [1991]) that except

in the degenerate case, the marginal distribution of a univariate autoregressive process

with iid noise is continuous. So if one considers a stationary solution of (6.1) in the one

dimensional case, S

has a continuous distribution. Consequently, the distribution of the

in (6.1) cannot have a discrete component. As such, this precludes the use of the linear

SSM as specied in (6.1) for modeling count time series.

Nonlinear SSMs possess a similar structure as the linear state space model. The lin-

ear equations are now replaced by specifying the appropriate conditional distributions.

Writing Y

(t)

=(Y

, ..., Y

)



and S

(t)

=(S

, ..., S

)



, the observation and state equations

become:

Observation equation:

p y

= p y

, s

(t−1)

, y

(t−1)

, t = 1, 2, ... (6.2)

State equation:

(

t+1

)

= p s

t+1

, s

(t−1)

, y

(t)

, t = 1, 2, ..., (6.3)







123 State Space Models for Count Time Series

where p(y

) and p(s

t+1

) are prespecied probability density functions

∗

. The joint den-

sity for the observations and state components can be rewritten (Brockwell and Davis, 2002,

Section 8.8) as

 

 

p(y

(n)

, s

(n)

) = p(s

j−1

)

j=2

where p

(·) is a pdf for the initial state vector S

. Using the fact that {S

} is a Markov process,

it follows that

p(y

(n)

) = p(y

). (6.4)

j=1

Even though the nonlinear state-space formulation given in (6.2) and (6.3) is quite gen-

eral, the specication can be limiting in that the state process is required to be Markov. An

alternative specication of a general SSM can be fashioned using (6.4) as a starting point.

That is, given values for the state process S

, ..., S

, the random variables Y

, ..., Y

are

assumed to be conditionally independent in the sense of (6.4). Using (6.4) and any joint

density p(s

(n)

) for S

(n)

, the joint density of Y

(n)

is then

p(y

(n)

) = p(y

)p(s

(n)

)ds

(n)

. (6.5)

j=1

So in general, the distribution of the state process can be freely specied and, in particular,

the state process need not be Markov.

For count time series, the observation probability density function must have support on

asubsetof0,1,2,...and is generally chosen to be one of the common discrete distribution

functions (i.e., Poisson, negative-binomial, Bernoulli, geometric, etc.), where the parameter

of the discrete distribution is a function of the state S

. One common choice is to assume that

the conditional distribution of Y

given S

is a member of a one-dimensional exponential

family with canonical parameter s

, i.e.,

p(y

) = exp{s

− b(s

) + c(y

)}, y = 0, 1, ..., (6.6)

where b(·), c(·) are known real functions and s

is the canonical parameter (see McCullagh

and Nelder, 1989). Recall that for an exponential family as specied in (6.6), we have the

following properties:

• E(Y

= s

) = b



)

• var(Y

= s

) = b



where b



and b



refer to the rst and second derivatives. The canonical link function g for

an exponential family model maps the mean function into the canonical parameter, that is,

∗

These are probability density functions relative to some dominating measure, which is usually taken to be

counting measure on {0, 1, ...} for p(y

) and Lebesgue measure for p(s

t−1









124 Handbook of Discrete-Valued Time Series

g(μ) = (b



)

(−1)

(μ). In other words, the link function is the inverse of the conditional mean,

which provides the mapping that expresses the state as a function of the conditional mean.

∗

Examples:

1. Poisson: p(y

) = exp{s

−e

−log(y

!)}, y

= 0, 1, .... In this case E(Y

) = exp{s

}

and hence the link function is g(μ) = log μ.

2. Binomial: Assuming the number of trials in the binomial at time t is m

, which

is known but could vary with t, then p(y

) = exp s

− m

log(1 + e

) −





log

, y

= 0, 1, ... , m

, where the probability of success on a single trial is

= e

/(1 + e

). In this case, b

) = m

log(1 + e

), E(Y

= s

) = m

/(1 + e

)

and the link is the logit function, g(p

) = logit(p

) = log(p

/(1 − p

)).

3. Negative-binomial: The NegBin(r, p) distribution has density function,

+ r − 1

p(y; r, p) = exp log(1 − p)y

+ r log p + log

, y

= 0, 1, 2, ...

r − 1

with mean r(1 − p)/p. If we use the natural parameterization where s

represents

the canonical parameter, that is, s

= log(1 −p

), then we see nd that s

< 0. From

a modeling perspective, this may be too restrictive. Allowing S

to be a linear time

series model that takes both positive and negative values permits greater exibil-

ity in terms of the behavior of S

. So instead of using the canonical link function

g(μ) =re

/(1 + e

), consider the alternative link function g

(μ) =− log(r/μ) =

− log(p/(1 − p)) =−logit(p). Setting s

(μ

) =−logit(p

),weobtain p

−s

(1 + e

−s

) so that the conditional density of Y

given S

= s

becomes

p(y

) = exp



−log(1 + e

−s

+ r log(e

−s

/(1 + e

−s

)) + log



+ r − 1



r − 1

= 0, 1, 2, ..., with conditional mean E(Y

) = re

−s

/(1 + e

−s

To complete the model specication for the count time series after the observation den-

sity p(y

) has been chosen, it remains to describe the distribution of the state process

}. Often the primary objective in modeling count time series is to describe associations

between observations and a suite of covariates. The covariates may be functions of time

(e.g., day of the week effect) or exogenous (e.g., climatic effects). Since the Y

s are assumed

to be conditionally independent given the state process, the mean structure and the depen-

dence in the time series can be modeled entirely through the state process. A natural choice

is to use a regression with time series errors model for S

.Ifx

= (1, x

... , x

)

denotes the vec-

tor of covariates associated with the t

observation

†

, then a regression time series model

for S

is given by

= x

β + α

, (6.7)

∗

One can use other choices for the link function besides the canonical one; as in the negative binomial example.

See McCullagh and Nelder (1989) for more details.

†

Our covariates always include an intercept term.













125 State Space Models for Count Time Series

where β = (β

, ..., β

)

is the vector of regression parameters and {α

} is a strictly station-

ary time series with zero mean. Sometimes the {α

} process or {S

} itself is referred to as a

latent process since it is not directly observed. Usually, but not always, one takes {α

} to be

a strictly stationary Gaussian time series for which there is an explicit expression for the

joint distribution of S

(n)

= (S

, ..., S

)

. In this case, writing 

= cov(S

(n)

, S

(n)

), the joint

density of Y

(n)

is given by



−

(n)

−Xβ)



−1

(n)

−Xβ)

(n)



p(y

(n)

) = p(y

β + α

)

(2π)

n/2

|

1/2

, (6.8)

t=1

where X = (x

, ..., x

)

is the design matrix and α

(n)

= (α

, ..., α

)

. For estimation

purposes, it is convenient to express (6.8) as a likelihood function and writing it in the

form

L(θ) =





p(y

β + α

)

(2π)

n/2

|

1/2

−

(α

(n)

)



−1

(n)

dα

(n)

, (6.9)

t=1

where the covariance matrix 

= 

(ψ) depends on the parameter vector ψ and θ

, ψ

denotes the complete parameter vector.

The SSM framework of (6.2) in which the conditional distribution in the observation

equation is given by a known family of discrete distributions, such as the Poisson, has a

number of desirable features. First, the setup is virtually identical to the starting point of

a Bayesian hierarchical model. Conditional on a state-process, which in the Poisson case

might be the intensity process, the observations are assumed to be conditionally inde-

pendent and Poisson distributed. Second, the serial dependence is then modeled entirely

through the state-equation. This represents a pleasing physical description for count data,

which in the Poisson case, ts under the umbrella of Cox-processes or doubly stochastic

Poisson processes. As in most Bayesian modeling settings, the unconditional distribution

of the observation, obtained by integrating out the state-variable, rarely has an explicit

form. Except in a limited number of cases, it is rare that the unconditional distribution

of Y

is of primary interest. The modeling emphasis in the SSM specication is on choice

of conditional distribution in the observation equation and the model for the state process

}. An overview of tests of the existence of a latent process and estimates of its underlying

correlation structure can be found in Davis et al. (1999).

The autocorrelation function (ACF) is the workhorse for describing dependence and

model tting for continuous response data using linear time series models. For nonlin-

ear time series models, including count time series, the ACF plays a more limited role.

For example, for nancial time series, where now Y

represents the daily log-returns of

an asset or exchange rate on day t, the data are typically uncorrelated and the ACF of

the data is not particularly useful. On the other hand, the ACF of the absolute values and

squares of the time series {Y

} can be quite useful in describing other types of serial depen-

dence. For time series of counts following the SSM model described above, the ACF can

also be used as a measure of dependence but is not always useful. In some cases, the ACF

of {Y

} can be expressed explicitly in terms of the ACF of the state-process. For example,

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 6: State Space Models for Count Time Series (1/5)

Create new playlist

Sign In

Sign Up

Table of Contents for
6: State Space Models for Count Time Series (1/5)