6: State Space Models for Count Time Series (4/5)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google















136 Handbook of Discrete-Valued Time Series

Davis and Yau (2011). They also showed in the case k = 2 that if one uses all pairs of obser-

vations instead of just consecutive pairs of observations, that is, CPL

is replaced by the sum

of the log likelihood of Y

, Y

for all s < t, then the composite likelihood estimator need no

longer be consistent. Also note that k = 1, which corresponds to just marginal distributions,

is allowed. While in this case, one might be able to consistently estimate parameters associ-

ated with the marginal distribution, there is no hope of estimating dependence parameters

since joint distributions are not part of the objective function. In this case, the dependence

parameters are not identiable.

To illustrate the use of the composite likelihood, consider Example 1 from Section 6.1 in

which the observational density is Poisson and the state process {α

} follows an AR(1) pro-

cess. That is, given the state-process {α

},the y

are independent and Poisson-distributed

with mean λ

= e

β+α

. The SSM is then specied by the equations

p(y

|α

; θ) = e

−e

β+α

(β+α

= φα

t−1

+ η

where η

∼ IIDN(0, σ

), |φ| < 1, and θ ={β, φ, σ

} is the parameter vector.

Let the observed data be y

={y

, y

, ..., y

} and set α

={α

, α

, ..., α

}. The pairwise

log-likelihood (here we are taking k = 2), is given by

n−1

CPL

(θ; y

) = log

p(y

|α

; θ)p(y

t+1

|α

t+1

; θ)f

(α

, α

t+1

)dα

dα

t+1

t=1

So unlike the computation for the full likelihood, that requires the computation of an

n-dimensional integral, the pairwise likelihood requires the computation of (n − 1)

two-dimensional integrals. Each of these integrals can be computed rather quickly using

numerical methods such as Gauss–Hermite quadrature.

Acomparison of the performance of the composite likelihood relative to the approximate

likelihood procedure described in Section 6.2.4 was made via a simulation study in Davis

and Yau (2011) (see Table 3 of the paper). The results show that CPL

performed comparably

to the AIS estimates. It is also worth noting that using higher orders of k, such as k = 3and

4 often gave worse estimates.

Ultimately, the estimation objective is to compute the maximum likelihood estimates and

there has been much effort, as described in earlier sections in nding either approximations

to the likelihood function of its optimizer. Even if one could compute the MLE directly, the

proof of consistency and asymptotic normality has not been fully argued. In contrast, and

perhaps one potential advantage in using composite likelihood methods is that one can give

a rigorous argument for the consistency and asymptotical normality of such estimates. We

give a brief outline of such an argument that follows the lines of the one given in Davis and

Yau (2011). For the setup of Example 1, let

cpl

(θ) = cpl(θ; y

, y

t+1

) = log p(y

|α

; θ)p(y

t+1

|α

t+1

; θ)f

(α

, α

t+1

)dα

dα

t+1

 



  



137 State Space Models for Count Time Series

and note that CPL

(θ; y

) =



−

cpl

(θ).Let θ

and

be the true value and the CPL

estimator of the parameter, respectively. Using a Taylor series expansion of CPL



(

; y

√

the derivative of CPL

, around θ

shows that n(

− θ

) is asymptotically equivalent to





−1

n−1

−

cpl



(θ

)

√

cpl



(θ). (6.30)

t=1 t=1

Since the process {Y

} is stationary and strongly mixing at a geometric rate, it follows from

the ergodic theorem that

n−1

a.s.

cpl



(θ

) −→ E(cpl



(θ

)).

t=1

Moreover, since {cpl



(θ

)} is also a stationary for strongly mixing sequence, a standard

central limit theorem for strongly mixing sequences (e.g., Doukhan, 1994), shows the

asymptotic normality of

√



−

cpl



(θ) with covariance matrix

∞

γ(h),

h=−∞

where γ(h) is the autocovariance matrix of {cpl



(θ

)}. Hence,

√

− θ

) is asymptotically

normal with mean 0 and covariance matrix given by





∞





 :=



Ecpl



(θ

)



−1

γ(h)



Ecpl



(θ

)



−1

. (6.31)

h=−∞

A consistent estimator for  is given by









n−1

cpl



(

)



−1





1 −

|k|



(k)





n−1

cpl



(

)



−1

, (6.32)

n n n

t=1

k=−r

t=1

where r

→∞, r

/n → 0, and

(k) =

n−1

cpl



(

)cpl



−k

(

t=k+1

The asymptotic variance of a composite likelihood estimator typically has a sandwich-type

form as given by (6.31). Such quantities can be difcult to estimate. One approach, in addi-

tion to using (6.32), is via the bootstrap for time series. The block bootstrap or stationary

bootstrap (see the discussion paper Politis et al. (2003) for a description of these methods)

can be used for generating nonparametric bootstrap replicates of a stationary time series.

138 Handbook of Discrete-Valued Time Series

This methodology provides an attractive alternative for computing asymptotic variances of

the estimates and for providing approximations to the sampling distribution of

√

n(θ

−θ

6.3 Applications to Analysis of Polio Data

In this section, we summarize a variety of analyses using the Poisson AR model for the

Polio data set consisting of the monthly number of U.S. cases of Poliomyelitis from 1970 to

1983 rst analysed by Zeger (1988). We parameterize the model as in Davis and Rodriguez-

Yam (2005) for example, in which the distribution of Y

given the state α

is Poisson with

rate λ

= e

. Here, β

:= (β

, ... , β

), x

is the vector of covariates given by

= (1, t/1000, cos(2πt/12), sin(2πt/12), cos(2πt/6), sin(2πt/6)),

and the state process is assumed to follow an AR(1) model. The vector of parameters is

θ = (β

, ... , β

, φ, σ

Table 6.1 compiles, from a variety of sources, the estimates and their standard errors for

the key parameters in this model, namely the coefcient of the linear time trend, β

,the

serial autocorrelation of the latent process, φ, and the innovation variance, σ

.Notethat

in some analyses, the parameterization of the latent process variance

σas

is used. The table adjusts these results to the above parameterization. Estimates of this

/(1 − φ

process variance, obtained as ˆσ

=ˆσ



/(1 −φ

), are presented as the nal column to allow

additional comparison between the various model ts.

TABLE 6.1

Estimates and Standard Errors for Key Parameters in Various Methods Applied to the Polio Series

Method (Source) β

se(β

) φ

se(φ

)

MCEM (Chan and Ledolter, 1995) −4.62 1.38 0.89 0.04 0.09 0.41

MCEM[NL] (McCulloch, 1997) −4.35 1.96 0.10 0.36 0.50 0.51

Bayes (Oh and Lim, 2001) −4.24 1.72 0.66 0.16 0.32 0.56

PQL[NL] (Breslow and Clayton, 1993) −3.46 3.04 0.70 0.13 0.26 0.51

AL (Davis and Rodriguez-Yam, 2005) −3.81 2.77 0.63 0.23 0.29 0.48

AL-BC (Davis and Rodriguez-Yam, 2005) −3.96 2.77 0.73 0.23 0.30 0.65

AIS (Davis and Rodriguez-Yam, 2005) −3.75 2.87 0.66 0.21 0.27 0.48

AIS-BC (Davis and Rodriguez-Yam, 2005) −3.76 2.87 0.73 0.21 0.30 0.64

MCNR (Kuk and Cheng, 1999) −3.82 2.77 0.67 0.18 0.27 0.48

EIS (Jung and Liesenfeld, 2001) −3.61 2.57 0.68 0.15 0.26 0.48

GLM (Davis et al., 2000) −4.80 4.11 — — — —

GEE (Zeger, 1988) −4.35 2.68 0.82 — 0.19 0.57

CPL

(Davis and Yau, 2011) −4.74 2.54 0.49 0.21 0.37 0.49

IBC[NL] (Kuk, 1995) −5.01 3.20 0.54 0.28 0.35 0.49

Note: ˆσ

=ˆσ



/(1 − φ

139 State Space Models for Count Time Series

The origin of the method and results when applied to the Polio data set is listed in paren-

theses, and additionally, if the results are from application of the method by Nelson and

Leroux (2006) these are also indicated by an additional annotation ‘NL’. The methods can be

roughly partitioned into three groups. Group 1 consists of two implementations of MCEM

(Monte Carlo EM) and a Bayes procedure. Group 2, which is essentially approximate

likelihood based-methods, consists of PQL (penalized quasilikelihood), AL (approximate

likelihood), AL-BC (bias corrected AL), AIS (approximate importance sampling), AIS-BC

(bias corrected AIS), MCNR (Monte Carlo Newton–Raphson), and EIS (efcient impor-

tance sampling). Note that the rst 3 procedures of this group are nonsimulation based,

while the last 4 involve some level of simulation. Group 3 consists of nonlikelihood-based

procedures: GLM (generalized linear model estimates ignoring the latent process), GEE

(generalized estimating equations), CPL

(pairwise composite likelihood), and IBC (itera-

tive bias correction using iterative weighted least squares). We exclude from our review the

few studies that have used alternative response distributions or latent process distributions

for these data so that the methods are compared on the same model.

With the exception of the GLM, GEE, CPL

, and Bayesian analyses, all other methods

aim to obtain approximations to the likelihood estimates and their standard errors. Clearly

there are both substantial differences and similarities between the results for various meth-

ods, a point also noted in Nelson and Leroux (2006). We now discuss these differences and

similarities in more detail in an attempt to draw some general conclusions about which

methods may be preferred. Of course, this comparison is only for application to a sin-

gle data set and much more research is required before general conclusions can be drawn.

However, this is the only data set for which all the methods listed have been applied. Unfor-

tunately, simulation evidence comparing the variety of methods is rather limited with the

exception of the results in Nelson and Leroux (2006).

6.3.1 Estimate of Trend Coefficient

The GLM, IBC method as implemented by Nelson and Leroux (2006), and CPL

give the

most negative trend estimates. It would appear as if the IBC method is not adjusting the bias

of the GLM estimate sufciently well and this may be a result of iterative weighted least

squares being used as the basis for the bias adjustment simulations. It is likely that these

methods are substantially biased. Amongst the remaining methods, there appear to be two

groups of values for the trend coefcient estimates: Group 1, the values for both implemen-

tations of MCEM and the Bayes t; and, Group 2 based on approximations to the likelihood

with and without importance sampling (PQL, AL, AL-BC, AIS, AIS-BC, MCNR, and EIS).

The concordance in Group 2 is perhaps not surprising since they are all aimed at approx-

imating the likelihood. However, it is surprising that the Group 1 do not agree as closely

with the Group 2 results. Turning to comparison of the estimated standard errors, those for

Group 1 appear to be substantially smaller than those for Group 2, and within this latter

group there is considerable agreement. Also note that the MCEM and Bayes methods are

biasing the point estimates towards larger negative values and biasing the associated esti-

mated standard errors downwards. The net effect of these two biases would be to increase

the ratio of estimate to standard error resulting in a higher chance of concluding that there

is a signicant downward trend in Polio cases over the time period of observation. On the

other hand, for Group 2, these test ratios would all be consistent with a conclusion of no

downward trend.



140 Handbook of Discrete-Valued Time Series

6.3.2 Estimate of Latent Process Parameters

Interestingly, the estimates of overall variance σˆ

are remarkably similar for all methods

apart from the GEE method, the Bayes method and the bias corrected AL and AIS meth-

ods. This suggests that the likelihood-based methods (including the MCEM methods) are

all nding the same degree of overall variability in the latent process. However, the two

MCEM methods differ substantially in their identication of the source of this latent process

variability. The MCEM method of McCulloch (1997) severely underestimates the autocorre-

lation φ, with corresponding larger values for σˆ

when compared with the MCEM method

as implemented by Chan and Ledolter (1995) and the other likelihood approximations. The

reason for this is not clear. However, since the only difference between the two MCEM

methods is that of Gibbs sampler or Metropolis–Hastings, it may be that these are not

exploring the sample space sufciently well when Monte Carlo draws are being generated

resulting in what appears to be lack of identiability between andφ σ



Nelson and Leroux (2006) and Skaug (2002) provide some information on comparison of

AL and AIS methods appear to suffer and the CPL method appear to apportion overall

variability to autocorrelation and innovation variance differently than the other likelihood

approximations (AL, AIS, MCNR, and EIS), which are quite consistent with each other.

Incidentally, the use of AD as in Skaug (2002) gives identical results to the AL method

and, because of this, are not recorded in Table 6.1. Further, the use of M = 100, M = 1000,

or M = 5000 importance samples as reported independently by Skaug (2002) has very little

impact on point estimates.

6.3.3 Comparisons of Computational Speed

speeds between some of the methods. However, they mix speeds reported by the origi-

nal authors with those obtained in their applications and simulations acknowledging that

different generation computer processors were used. There is no comprehensive compar-

ison of speeds for the models listed in Table 6.1. However, it is clear that the approximate

likelihood methods are the fastest overall requiring no simulations or Monte Carlo to obtain

estimates.

6.3.4 Some Recommendations

Based on this comparison on a single data set (with all the limitations for generality that

implies):

1. Overall, the use of Laplace approximation to the likelihood results in point esti-

mates and standard errors that are sufciently close to those obtained from the

more accurate importance sample augmented approximations. It would appear

that, for these data at least, importance sampling is not providing much additional

benet to inference.

2. Use of CPL appears to provide biased results and has no clear computational

advantage over the Laplace approximation method.

3. The MCEM method should be avoided until an explanation can be found for the

obvious differences between results from two different implementations (Gibbs

versus Metropolis–Hastings sampling) and for the clear bias in point estimates and

underestimation of standard errors that the method produces.

. The bias adjusted

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 6: State Space Models for Count Time Series (4/5)

Create new playlist

Sign In

Sign Up

Table of Contents for
6: State Space Models for Count Time Series (4/5)