126 Handbook of Discrete-Valued Time Series
if the observation equation is Poisson and {S
t
} is a stationary Gaussian process with mean
β
1
and autocovariance function γ
S
(h), then (see Davis et al., 2000)
ρ
Y
(h) = Cor(Y
t
, Y
t+h
) =
exp{γ
S
(h)}−1
.
e
β
1
+ (exp{γ
S
(0)}−1)
Moreover, if γ
S
(h)>0, then
γ
S
(h)
0 ρ
Y
(h)
= ρ
S
(h).
γ
S
(0)
Thus, little or no autocorrelation in the data can mask potential large correlation in the
latent process.
In Section 6.2, we consider various methods of estimation for regression parameters
and the parameters determining the covariance matrix function for count time series sat-
isfying (6.4) and (6.7). The main issue is that the joint distribution is given by the n-fold
integrals in (6.5) or (6.9), which can be difcult to compute numerically for a xed set
of parameter values, let alone maximizing over the parameter space. Strategies for nd-
ing maximum likelihood estimators include Laplace-style approximations to the integral
in (6.5) and simulation-based procedures using either MCMC or importance sampling.
Alternative estimation procedures based on estimating equations and composite likelihood
procedures are also discussed in Section 6.2.
Gamerman et al., (2015; Chapter 8 in this volume) consider the formulation and esti-
mation of dynamic Bayesian models. These models combine state-space dynamics with
generalized linear models (GLMS). Such models can include covariates where now the
coefcients evolve dynamically. Earlier work on dynamic generalized models can be found
in Fahrmeir (1992). In the SSM setup of this chapter, the dynamics of the state process
{S
t
} are specied independent of the observed process. The book by Durbin and Koopman
(2012) is an excellent reference on SSMs in a general setting.
A different approach, which is discussed by Fokianos (2015; Chapter 1 in this volume)
and Tjøstheim (2015; Chapter 4 in this volume), allows the state-process {S
t
} to be explicit
functions of previous observations. Estimation for these models tends to be simpler than
the models considered in this chapter. On the other hand, incorporating regressors in a
meaningful way in these models as well as underlying theory for these models is far more
difcult.
In the remainder of this chapter, estimation procedures are described in Section 6.2 and
applied to the Polio data, a classical time series of count data set, in Section 6.3. Forecasting
for these models is treated in Section 6.4.
6.2 Approaches to Estimation
Not surprisingly, given the numerical complexity associated with approximating the high
dimensional integral required to compute the likelihood (6.9), many different approaches
to estimation of nonlinear non-Gaussian models have been used. These include: GLM
methods for estimating regression parameters β in which the latent process is ignored,
generalized estimating equation (GEE), which uses a working covariance matrix to
127 State Space Models for Count Time Series
adjust inference about β for serial dependence effects, composite likelihood methods in
which various lower dimensional marginal distributions are combined to dene an objec-
tive function to be maximized over both β and ψ, approximations to the likelihood such
as the Laplace approximation or the penalized quasi likelihood (PQL) method, and use of
importance sampling and other Monte Carlo methods. In this section, the main methods
are reviewed and compared. Bayesian methods are not reviewed in this chapter but are
reviewed elsewhere in this volume (see, for example, Chapters 8 and 11). However, the
importance sampling methods to be discussed in this section are also used to implement
Bayesian methods.
6.2.1 Estimating Equations
As a precursor to using a fully efcient estimation method for the SSM, inference based
on estimating equations can be useful. Zeger (1988) was one of the rst to develop GEE
methods for Poisson time series (see also Thavaneswaran and Ravishanker [2015; Chapter 7
in this volume]). He proposed nding an estimate of β by solving
U
GEE
(β) =
μ
V
1
(β, ψ) y
(n)
μ = 0, (6.10)
β
where μ
T
= exp(x
1
T
β), ..., exp(x
T
n
β) and V(β, ψ) is a working covariance matrix selected
to reect the latent process autocorrelation structure (with parameters ψ) and the variance
of the observations.
Zeger (1988) shows that, in the Poisson case, the method is consistent and asymptotically
normal with appropriate estimates of the covariance matrix. He applies the method to the
Polio data using an AR(1) specication for the lognormal latent process. He also develops a
method of moments approach to estimating the variance and autocovariances of the latent
process. Davis et al. (1999) develop a bias adjustment for this method based on the GLM
estimates of β, that we next discuss.
The most elementary method for estimating β alone uses the GLM estimating equa-
tion derived assuming that there is no latent process in (6.7), i.e., it is falsely assumed that
S
t
= x
t
T
β so that the Y
t
are independent and not just conditionally so. Therefore, the GLM
estimator β
ˆ
n
of β solves the score equation corresponding to the quasilikelihood under the
independence assumption:
U
GLM
(β) = X(y
(n)
μ) = 0, (6.11)
where x
t
T
is the t
th
row of the design matrix X. See Davis et al. (1999, 2000) for the
Poisson case, Davis and Wu (2009) for the negative binomial case in particular and some
other members of the exponential family and Wu and Cui (2013) for the binary case. GLM
provides preliminary estimates of the regression needed to form assessment of the existence
of a latent process and the nature of its autocorrelation structure at least for the Poisson case
as discussed in Davis et al. (2000).
The central limit theorem has also been established for GLM estimates for the Poisson
case (Davis et al., 2000), the negative binomial case (Davis and Wu, 2009) and the binary
case (Wu and Cui, 2013). For the Poisson case, Davis et al. (2000) provide formulae for
estimating the asymptotic covariance based on parametric estimators of the latent pro-
cess model autocovariances. They demonstrate that these are usefully accurate through
128 Handbook of Discrete-Valued Time Series
simulations and applications to data. As pointed out by Wu (2012), use of nonparametric
estimates of the autocovariances in a plug-in estimator of the asymptotic covariance will
not provide consistent estimators of the asymptotic covariance. Instead, he proposed use of
kernel-based estimates and shows they are consistent. It is particularly important that the
standard errors produced by GLM (as implemented in standard software packages) not be
used because they can be seriously misleading. For example, in the case of the Polio data set
to be reviewed in Section 6.3, the standard error reported using GLM for the important lin-
ear time trend parameter is 1.40, whereas the correct asymptotic standard error, calculated
as just described, results in a standard error of 4.11 (almost 3 times larger).
In the Poisson and negative binomial cases, consistency and asymptotic normality have
been established for the GLM estimate β
ˆ
GLM
under suitable conditions on the regressors
and log-link function. The regressors x
nt
are typically dened in terms of a smooth function
f : [0, 1]→R
d
from the unit interval having the form x
nt
= f (t/n). At least for the Poisson
and negative binomial case, this scaling of regression by sample size is required to prevent
degeneracy. Regressors can also include harmonic functions (for seasonal components) as
well as observed realizations from stationary random processes. In the above treatments,
the latent process {α
t
} is required to be a stationary Gaussian ARMA model or
t
= exp(α
t
)
is required to be a stationary strongly mixing process with nite (4 + δ)th moment. In
the Poisson and negative binomial cases with log-link functions, the optimization of the
quasi loglikelihood objective function can be performed over all of R
d
since this function
is concave.
The same argument can be applied to show that the GLM estimators converge to a limit
point and are asymptotically normal when the quasi loglikelihood function is concave.
However, in order for the limit point to be the true parameter, the identiability condition
E{b
(x
T
β + α
t
)}=b
(x
T
β) for all t must be met. This is satised for some count response
t t
distributions including the Poisson and negative binomial distributions with log link func-
tions. However, for the binomial response family, this identity cannot hold for any link
function using the normal distribution for the latent process. The same argument can be
applied to show that the GEE method proposed by Zeger (1988) will not provide consistent
estimates of β in the binomial case.
Correction for the bias in the binary case requires integration over the marginal distribu-
tion of α
t
, that in turn requires knowledge of its variance, something that is not available
from the GLM estimates. Wu and Cui (2013) introduce a modied GLM method that adjusts
for the bias in the regression parameter estimates. They introduce a modied estimating
equation in which the success probability used in GLM, π
t
= 1/(1+exp(x
T
β)), is replaced
nt
by the marginal mean m(x
nt
T
β) = P(Y
t
= 1). The monotonically increasing function m(u)
is estimated nonparametrically using local linear regression. Wu and Cui (2013) prove that
this method produces consistent and asymptotically normal estimates of the regression
coefcients. They also provide a method for calculating the asymptotic covariance matrix.
6.2.2 Likelihood-based Methods
The main impediments to using maximum likelihood methods for routine analysis are
threefold:
Optimizing the likelihood: Calculation of the likelihood in (6.9) cannot be done in closed
form for moderate sample sizes let alone large dimensions on the order of mag-
nitude of 1,000 or 10,000. In order to nd the maximum likelihood estimator, one
129 State Space Models for Count Time Series
typically resorts to numerical optimization of an approximation to L(θ). These meth-
ods often do not rely on derivative information. If a gradient or Hessian is required,
then another d or d(d + 1)/2 integrals need to be approximated, where d = dim(θ).
Computing standard errors: Once an approximation
θ
ˆ
to the MLE has been produced,
standard errors are required for inference purposes. This is especially important
in the model tting stage when perhaps a number of covariates are being consid-
ered for inclusion in the model. Because of the large dimensional integrals dening
the likelihood, approaches using approximations to the Hessian or scoring algo-
rithms are problematic. Often one resorts to numerical approximations to derivatives
of the approximating likelihood evaluated at the estimated value. However, these
estimates can be quite variable and numerically sensitive to the choice of tuning
parameters in the numerical algorithms. Bootstrap methods, in which each boot-
strap replicate would require its own n-dimensional integral to be computed, is one
possible workaround for computing more reliable standard errors.
Asymptotic theory: There are currently no proofs that the MLE
θ
ˆ
is consistent or asymp-
totically normal. One would certainly expect these properties to hold, but since the
form of the likelihood is rather intractable, the arguments are not standard adap-
tations of existing proofs. Nonetheless, it is important to have a complete theory
worked out for maximum likelihood estimation in these models in order to ensure
that inferences about the parameters are justiable.
As a result of these practical concerns, a large variety of methods have been proposed to
approximate the likelihood and, to a lesser extent, derivatives of these approximations. The
main approaches in the literature are approximations to the integrand in (6.9), which can
be integrated in closed form to get an approximation to the likelihood. Improvements to
these approximations are typically based on Monte Carlo methods and importance sam-
pling from approximating importance densities. Quasi Monte Carlo (QMC) methods based
on randomly shifted deterministic lattices of points in R
n
have also been applied in recent
years. One recent attempt is given in Sinescu et al. (2014) where a Poisson model with a
constant mean plus AR(1) state process is considered. While QMC methods hold promise,
further development is required before they become competitive with other methods
reviewed here.
6.2.3 Earlier Monte Carlo and Approximate Methods
Nelson and Leroux (2006) review various methods in this class for estimating the likelihood
function, including the Monte Carlo expectation method rst used for the Poisson response
AR(1) model in Chan and Ledolter (1995) based on Gibb’s sampling for the E-step, a
version due to McCulloch based on Metropolis–Hastings sampling, Monte Carlo Newton–
Raphson (Kuk and Cheng, 1999) and a modied version of the iterative bias correction
method of Kuk (1995). They compare performance of these methods with the original esti-
mating equations approach of Zeger (1988) and the PQL approach (based on a Laplace
approximation to the likelihood) of Breslow and Clayton (1993) using simulations and by
application to the Polio data in Zeger (1988). Davis and Rodriguez-Yam (2005) also review
the Monte Carlo Expectation Maximization and Monte Carlo Newton–Raphson methods.
Apart from the Bayesian method, all the methods reviewed by Nelson and Leroux pro-
vide, as a byproduct of the parameter estimation, estimated covariances matrices under
the assumption that the estimates satisfy a central limit theorem.
130 Handbook of Discrete-Valued Time Series
Nelson and Leroux conclude: “The results have clearly shown that the different methods
commonly used at present for tting a log-linear generalized linear mixed model with
an autoregressive random effects correlation structure do yield different sets of parame-
ter estimates, in particular, the parameters related to the random effects distribution.” We
will summarize the main points of differences between the various methods in Section 6.3
when applied to the Polio data set.
6.2.4 Laplace and Gaussian Approximations
Based on (6.5), the likelihood for the unknown parameters is
L(θ) =
e
F(α,y;θ)
dα, (6.12)
R
n
where
n
F(α, y; θ) =
{log p(y
t
|x
nt
T
β + α
t
)}−
1
α
T
Vα +
1
log det(V)
n
log 2π, (6.13)
2 2 2
t=1
and V =
n
1
. For many models, the exponent F in (6.12) is unimodal in α. Laplace’s approx-
imation replaces the integrand by a normal density that matches that obtained using a
second order Taylor series expansion of F around its mode
α
= arg max F(α, y; θ). (6.14)
α
To nd this mode, the Newton–Raphson method has proved effective for the primary
model considered here. Let F
(α, y; θ) denote the rst derivative vector and F

(α, y; θ) the
matrix of second derivatives both with respect to α. For the conditionally independent
model with a Gaussian latent process of (6.13), it follows that
n
F
(α, y; θ) =
log p y
t
|x
t
T
β + α
t
Vα (6.15)
α
t=1
and
F

(α|y; θ) =−(K + V), (6.16)
where, as a result of the conditional independence, K is the diagonal matrix given by
2
K =−diag
α
2
log p(y
t
|α
t
; θ); t = 1, ..., n .
t
Let α
(k)
be the kth iterate (where dependence on y and θ have been suppressed in the
notation). The Newton–Raphson updates are given by
1
α
(k+1)
= α
(k)
F

α
(k)
F
α
(k)
. (6.17)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset