6: State Space Models for Count Time Series (2/5)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

126 Handbook of Discrete-Valued Time Series

if the observation equation is Poisson and {S

} is a stationary Gaussian process with mean

and autocovariance function γ

(h), then (see Davis et al., 2000)

(h) = Cor(Y

, Y

t+h

) =

exp{γ

(h)}−1

−β

+ (exp{γ

(0)}−1)

Moreover, if γ

(h)>0, then

(h)

0 ≤ ρ

(h) ≤

= ρ

(h).

(0)

Thus, little or no autocorrelation in the data can mask potential large correlation in the

latent process.

In Section 6.2, we consider various methods of estimation for regression parameters

and the parameters determining the covariance matrix function for count time series sat-

isfying (6.4) and (6.7). The main issue is that the joint distribution is given by the n-fold

integrals in (6.5) or (6.9), which can be difcult to compute numerically for a xed set

of parameter values, let alone maximizing over the parameter space. Strategies for nd-

ing maximum likelihood estimators include Laplace-style approximations to the integral

in (6.5) and simulation-based procedures using either MCMC or importance sampling.

Alternative estimation procedures based on estimating equations and composite likelihood

procedures are also discussed in Section 6.2.

Gamerman et al., (2015; Chapter 8 in this volume) consider the formulation and esti-

mation of dynamic Bayesian models. These models combine state-space dynamics with

generalized linear models (GLMS). Such models can include covariates where now the

coefcients evolve dynamically. Earlier work on dynamic generalized models can be found

in Fahrmeir (1992). In the SSM setup of this chapter, the dynamics of the state process

} are specied independent of the observed process. The book by Durbin and Koopman

(2012) is an excellent reference on SSMs in a general setting.

A different approach, which is discussed by Fokianos (2015; Chapter 1 in this volume)

and Tjøstheim (2015; Chapter 4 in this volume), allows the state-process {S

} to be explicit

functions of previous observations. Estimation for these models tends to be simpler than

the models considered in this chapter. On the other hand, incorporating regressors in a

meaningful way in these models as well as underlying theory for these models is far more

difcult.

In the remainder of this chapter, estimation procedures are described in Section 6.2 and

applied to the Polio data, a classical time series of count data set, in Section 6.3. Forecasting

for these models is treated in Section 6.4.

6.2 Approaches to Estimation

Not surprisingly, given the numerical complexity associated with approximating the high

dimensional integral required to compute the likelihood (6.9), many different approaches

to estimation of nonlinear non-Gaussian models have been used. These include: GLM

methods for estimating regression parameters β in which the latent process is ignored,

generalized estimating equation (GEE), which uses a working covariance matrix to

 

 

127 State Space Models for Count Time Series

adjust inference about β for serial dependence effects, composite likelihood methods in

which various lower dimensional marginal distributions are combined to dene an objec-

tive function to be maximized over both β and ψ, approximations to the likelihood such

as the Laplace approximation or the penalized quasi likelihood (PQL) method, and use of

importance sampling and other Monte Carlo methods. In this section, the main methods

are reviewed and compared. Bayesian methods are not reviewed in this chapter but are

reviewed elsewhere in this volume (see, for example, Chapters 8 and 11). However, the

importance sampling methods to be discussed in this section are also used to implement

Bayesian methods.

6.2.1 Estimating Equations

As a precursor to using a fully efcient estimation method for the SSM, inference based

on estimating equations can be useful. Zeger (1988) was one of the rst to develop GEE

methods for Poisson time series (see also Thavaneswaran and Ravishanker [2015; Chapter 7

in this volume]). He proposed nding an estimate of β by solving

GEE

(β) =

∂μ



−1

(β, ψ) y

(n)

− μ = 0, (6.10)

∂β

where μ

= exp(x

β), ..., exp(x

β) and V(β, ψ) is a working covariance matrix selected

to reect the latent process autocorrelation structure (with parameters ψ) and the variance

of the observations.

Zeger (1988) shows that, in the Poisson case, the method is consistent and asymptotically

normal with appropriate estimates of the covariance matrix. He applies the method to the

Polio data using an AR(1) specication for the lognormal latent process. He also develops a

method of moments approach to estimating the variance and autocovariances of the latent

process. Davis et al. (1999) develop a bias adjustment for this method based on the GLM

estimates of β, that we next discuss.

The most elementary method for estimating β alone uses the GLM estimating equa-

tion derived assuming that there is no latent process in (6.7), i.e., it is falsely assumed that

= x

β so that the Y

are independent and not just conditionally so. Therefore, the GLM

estimator β

of β solves the score equation corresponding to the quasilikelihood under the

independence assumption:

GLM

(β) = X(y

(n)

− μ) = 0, (6.11)

where x

is the t

row of the design matrix X. See Davis et al. (1999, 2000) for the

Poisson case, Davis and Wu (2009) for the negative binomial case in particular and some

other members of the exponential family and Wu and Cui (2013) for the binary case. GLM

provides preliminary estimates of the regression needed to form assessment of the existence

of a latent process and the nature of its autocorrelation structure at least for the Poisson case

as discussed in Davis et al. (2000).

The central limit theorem has also been established for GLM estimates for the Poisson

case (Davis et al., 2000), the negative binomial case (Davis and Wu, 2009) and the binary

case (Wu and Cui, 2013). For the Poisson case, Davis et al. (2000) provide formulae for

estimating the asymptotic covariance based on parametric estimators of the latent pro-

cess model autocovariances. They demonstrate that these are usefully accurate through

128 Handbook of Discrete-Valued Time Series

simulations and applications to data. As pointed out by Wu (2012), use of nonparametric

estimates of the autocovariances in a plug-in estimator of the asymptotic covariance will

not provide consistent estimators of the asymptotic covariance. Instead, he proposed use of

kernel-based estimates and shows they are consistent. It is particularly important that the

standard errors produced by GLM (as implemented in standard software packages) not be

used because they can be seriously misleading. For example, in the case of the Polio data set

to be reviewed in Section 6.3, the standard error reported using GLM for the important lin-

ear time trend parameter is 1.40, whereas the correct asymptotic standard error, calculated

as just described, results in a standard error of 4.11 (almost 3 times larger).

In the Poisson and negative binomial cases, consistency and asymptotic normality have

been established for the GLM estimate β

GLM

under suitable conditions on the regressors

and log-link function. The regressors x

are typically dened in terms of a smooth function

f : [0, 1]→R

from the unit interval having the form x

= f (t/n). At least for the Poisson

and negative binomial case, this scaling of regression by sample size is required to prevent

degeneracy. Regressors can also include harmonic functions (for seasonal components) as

well as observed realizations from stationary random processes. In the above treatments,

the latent process {α

} is required to be a stationary Gaussian ARMA model or 

= exp(α

)

is required to be a stationary strongly mixing process with nite (4 + δ)th moment. In

the Poisson and negative binomial cases with log-link functions, the optimization of the

quasi loglikelihood objective function can be performed over all of R

since this function

is concave.

The same argument can be applied to show that the GLM estimators converge to a limit

point and are asymptotically normal when the quasi loglikelihood function is concave.

However, in order for the limit point to be the true parameter, the identiability condition

E{b



β + α

)}=b



β) for all t must be met. This is satised for some count response

t t

distributions including the Poisson and negative binomial distributions with log link func-

tions. However, for the binomial response family, this identity cannot hold for any link

function using the normal distribution for the latent process. The same argument can be

applied to show that the GEE method proposed by Zeger (1988) will not provide consistent

estimates of β in the binomial case.

Correction for the bias in the binary case requires integration over the marginal distribu-

tion of α

, that in turn requires knowledge of its variance, something that is not available

from the GLM estimates. Wu and Cui (2013) introduce a modied GLM method that adjusts

for the bias in the regression parameter estimates. They introduce a modied estimating

equation in which the success probability used in GLM, π

= 1/(1+exp(−x

β)), is replaced

by the marginal mean m(x

β) = P(Y

= 1). The monotonically increasing function m(u)

is estimated nonparametrically using local linear regression. Wu and Cui (2013) prove that

this method produces consistent and asymptotically normal estimates of the regression

coefcients. They also provide a method for calculating the asymptotic covariance matrix.

6.2.2 Likelihood-based Methods

The main impediments to using maximum likelihood methods for routine analysis are

threefold:

Optimizing the likelihood: Calculation of the likelihood in (6.9) cannot be done in closed

form for moderate sample sizes let alone large dimensions on the order of mag-

nitude of 1,000 or 10,000. In order to nd the maximum likelihood estimator, one

129 State Space Models for Count Time Series

typically resorts to numerical optimization of an approximation to L(θ). These meth-

ods often do not rely on derivative information. If a gradient or Hessian is required,

then another d or d(d + 1)/2 integrals need to be approximated, where d = dim(θ).

Computing standard errors: Once an approximation

to the MLE has been produced,

standard errors are required for inference purposes. This is especially important

in the model tting stage when perhaps a number of covariates are being consid-

ered for inclusion in the model. Because of the large dimensional integrals dening

the likelihood, approaches using approximations to the Hessian or scoring algo-

rithms are problematic. Often one resorts to numerical approximations to derivatives

of the approximating likelihood evaluated at the estimated value. However, these

estimates can be quite variable and numerically sensitive to the choice of tuning

parameters in the numerical algorithms. Bootstrap methods, in which each boot-

strap replicate would require its own n-dimensional integral to be computed, is one

possible workaround for computing more reliable standard errors.

Asymptotic theory: There are currently no proofs that the MLE

is consistent or asymp-

totically normal. One would certainly expect these properties to hold, but since the

form of the likelihood is rather intractable, the arguments are not standard adap-

tations of existing proofs. Nonetheless, it is important to have a complete theory

worked out for maximum likelihood estimation in these models in order to ensure

that inferences about the parameters are justiable.

As a result of these practical concerns, a large variety of methods have been proposed to

approximate the likelihood and, to a lesser extent, derivatives of these approximations. The

main approaches in the literature are approximations to the integrand in (6.9), which can

be integrated in closed form to get an approximation to the likelihood. Improvements to

these approximations are typically based on Monte Carlo methods and importance sam-

pling from approximating importance densities. Quasi Monte Carlo (QMC) methods based

on randomly shifted deterministic lattices of points in R

have also been applied in recent

years. One recent attempt is given in Sinescu et al. (2014) where a Poisson model with a

constant mean plus AR(1) state process is considered. While QMC methods hold promise,

further development is required before they become competitive with other methods

reviewed here.

6.2.3 Earlier Monte Carlo and Approximate Methods

Nelson and Leroux (2006) review various methods in this class for estimating the likelihood

function, including the Monte Carlo expectation method rst used for the Poisson response

AR(1) model in Chan and Ledolter (1995) based on Gibb’s sampling for the E-step, a

version due to McCulloch based on Metropolis–Hastings sampling, Monte Carlo Newton–

Raphson (Kuk and Cheng, 1999) and a modied version of the iterative bias correction

method of Kuk (1995). They compare performance of these methods with the original esti-

mating equations approach of Zeger (1988) and the PQL approach (based on a Laplace

approximation to the likelihood) of Breslow and Clayton (1993) using simulations and by

application to the Polio data in Zeger (1988). Davis and Rodriguez-Yam (2005) also review

the Monte Carlo Expectation Maximization and Monte Carlo Newton–Raphson methods.

Apart from the Bayesian method, all the methods reviewed by Nelson and Leroux pro-

vide, as a byproduct of the parameter estimation, estimated covariances matrices under

the assumption that the estimates satisfy a central limit theorem.







130 Handbook of Discrete-Valued Time Series

Nelson and Leroux conclude: “The results have clearly shown that the different methods

commonly used at present for tting a log-linear generalized linear mixed model with

an autoregressive random effects correlation structure do yield different sets of parame-

ter estimates, in particular, the parameters related to the random effects distribution.” We

will summarize the main points of differences between the various methods in Section 6.3

when applied to the Polio data set.

6.2.4 Laplace and Gaussian Approximations

Based on (6.5), the likelihood for the unknown parameters is

L(θ) =

F(α,y;θ)

dα, (6.12)

where

F(α, y; θ) =



{log p(y

β + α

)}−

Vα +

log det(V) −

log 2π, (6.13)

2 2 2

t=1

and V = 

−1

. For many models, the exponent F in (6.12) is unimodal in α. Laplace’s approx-

imation replaces the integrand by a normal density that matches that obtained using a

second order Taylor series expansion of F around its mode

∗

= arg max F(α, y; θ). (6.14)

To nd this mode, the Newton–Raphson method has proved effective for the primary

model considered here. Let F



(α, y; θ) denote the rst derivative vector and F



(α, y; θ) the

matrix of second derivatives both with respect to α. For the conditionally independent

model with a Gaussian latent process of (6.13), it follows that







(α, y; θ) =



∂

log p y

β + α

− Vα (6.15)

∂α

t=1

and



(α|y; θ) =−(K + V), (6.16)

where, as a result of the conditional independence, K is the diagonal matrix given by

∂

K =−diag

∂α

log p(y

|α

; θ); t = 1, ..., n .

Let α

(k)

be the kth iterate (where dependence on y and θ have been suppressed in the

notation). The Newton–Raphson updates are given by

 

−1

 

(k+1)

= α

(k)

− F



(k)



(k)

. (6.17)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 6: State Space Models for Count Time Series (2/5)

Create new playlist

Sign In

Sign Up

Table of Contents for
6: State Space Models for Count Time Series (2/5)