

136 Handbook of Discrete-Valued Time Series
Davis and Yau (2011). They also showed in the case k = 2 that if one uses all pairs of obser-
vations instead of just consecutive pairs of observations, that is, CPL
2
is replaced by the sum
of the log likelihood of Y
s
, Y
t
for all s < t, then the composite likelihood estimator need no
longer be consistent. Also note that k = 1, which corresponds to just marginal distributions,
is allowed. While in this case, one might be able to consistently estimate parameters associ-
ated with the marginal distribution, there is no hope of estimating dependence parameters
since joint distributions are not part of the objective function. In this case, the dependence
parameters are not identiable.
To illustrate the use of the composite likelihood, consider Example 1 from Section 6.1 in
which the observational density is Poisson and the state process {α
t
} follows an AR(1) pro-
cess. That is, given the state-process {α
t
},the y
t
are independent and Poisson-distributed
with mean λ
t
= e
β+α
t
. The SSM is then specied by the equations
p(y
t
|α
t
; θ) = e
e
β+α
t
e
(β+α
t
)y
t
/y
t
!,
α
t
= φα
t1
+ η
t
,
where η
t
IIDN(0, σ
2
), |φ| < 1, and θ ={β, φ, σ
2
} is the parameter vector.
Let the observed data be y
n
={y
1
, y
2
, ..., y
n
} and set α
α
α
n
={α
1
, α
2
, ..., α
n
}. The pairwise
log-likelihood (here we are taking k = 2), is given by
n1
CPL
2
(θ; y
n
) = log
p(y
t
|α
t
; θ)p(y
t+1
|α
t+1
; θ)f
θ
(α
t
, α
t+1
)dα
t
dα
t+1
.
t=1
So unlike the computation for the full likelihood, that requires the computation of an
n-dimensional integral, the pairwise likelihood requires the computation of (n 1)
two-dimensional integrals. Each of these integrals can be computed rather quickly using
numerical methods such as Gauss–Hermite quadrature.
Acomparison of the performance of the composite likelihood relative to the approximate
likelihood procedure described in Section 6.2.4 was made via a simulation study in Davis
and Yau (2011) (see Table 3 of the paper). The results show that CPL
2
performed comparably
to the AIS estimates. It is also worth noting that using higher orders of k, such as k = 3and
4 often gave worse estimates.
Ultimately, the estimation objective is to compute the maximum likelihood estimates and
there has been much effort, as described in earlier sections in nding either approximations
to the likelihood function of its optimizer. Even if one could compute the MLE directly, the
proof of consistency and asymptotic normality has not been fully argued. In contrast, and
perhaps one potential advantage in using composite likelihood methods is that one can give
a rigorous argument for the consistency and asymptotical normality of such estimates. We
give a brief outline of such an argument that follows the lines of the one given in Davis and
Yau (2011). For the setup of Example 1, let
cpl
t
(θ) = cpl(θ; y
t
, y
t+1
) = log p(y
t
|α
t
; θ)p(y
t+1
|α
t+1
; θ)f
θ
(α
t
, α
t+1
)dα
t
dα
t+1
,
137 State Space Models for Count Time Series
and note that CPL
2
(θ; y
n
) =
t
n
=
1
1
cpl
t
(θ).Let θ
0
and
θ
ˆ
be the true value and the CPL
2
estimator of the parameter, respectively. Using a Taylor series expansion of CPL
2
(
θ
ˆ
; y
n
),
the derivative of CPL
2
, around θ
0
shows that n(
θ
ˆ
θ
0
) is asymptotically equivalent to
1
1
n1
1
n1
cpl
t

(θ
0
)
cpl
t
(θ). (6.30)
n
n
t=1 t=1
Since the process {Y
t
} is stationary and strongly mixing at a geometric rate, it follows from
the ergodic theorem that
1
n1
a.s.
cpl

t
(θ
0
) −→ E(cpl
1

(θ
0
)).
n
t=1
Moreover, since {cpl
t
(θ
0
)} is also a stationary for strongly mixing sequence, a standard
central limit theorem for strongly mixing sequences (e.g., Doukhan, 1994), shows the
asymptotic normality of
1
t
n
=
1
1
cpl
t
(θ) with covariance matrix
n
γ(h),
h=−∞
where γ(h) is the autocovariance matrix of {cpl
t
(θ
0
)}. Hence,
n(
θ
ˆ
θ
0
) is asymptotically
normal with mean 0 and covariance matrix given by
:=
Ecpl
1

(θ
0
)
1
γ(h)
Ecpl

1
(θ
0
)
1
. (6.31)
h=−∞
A consistent estimator for is given by
ˆ
n
=
1
n1
cpl

t
(
θ
ˆ
)
1
r
n
1
|k|
γ
ˆ
(k)
1
n1
cpl

t
(
θ
ˆ
)
1
, (6.32)
n n n
t=1
k=−r
n
t=1
where r
n
→∞, r
n
/n 0, and
γ
ˆ
(k) =
1
n1
cpl
t
(
θ
ˆ
)cpl
t
T
k
(
θ
ˆ
).
n
t=k+1
The asymptotic variance of a composite likelihood estimator typically has a sandwich-type
form as given by (6.31). Such quantities can be difcult to estimate. One approach, in addi-
tion to using (6.32), is via the bootstrap for time series. The block bootstrap or stationary
bootstrap (see the discussion paper Politis et al. (2003) for a description of these methods)
can be used for generating nonparametric bootstrap replicates of a stationary time series.
138 Handbook of Discrete-Valued Time Series
This methodology provides an attractive alternative for computing asymptotic variances of
the estimates and for providing approximations to the sampling distribution of
n(θ
ˆ
θ
0
).
6.3 Applications to Analysis of Polio Data
In this section, we summarize a variety of analyses using the Poisson AR model for the
Polio data set consisting of the monthly number of U.S. cases of Poliomyelitis from 1970 to
1983 rst analysed by Zeger (1988). We parameterize the model as in Davis and Rodriguez-
Yam (2005) for example, in which the distribution of Y
t
given the state α
t
is Poisson with
rate λ
t
= e
α
t
+x
t
T
β
. Here, β
T
:= (β
1
, ... , β
6
), x
t
is the vector of covariates given by
x
t
T
= (1, t/1000, cos(2πt/12), sin(2πt/12), cos(2πt/6), sin(2πt/6)),
and the state process is assumed to follow an AR(1) model. The vector of parameters is
θ = (β
1
, ... , β
6
, φ, σ
2
).
Table 6.1 compiles, from a variety of sources, the estimates and their standard errors for
the key parameters in this model, namely the coefcient of the linear time trend, β
2
,the
serial autocorrelation of the latent process, φ, and the innovation variance, σ
2
.Notethat
in some analyses, the parameterization of the latent process variance
2
σas
α
is used. The table adjusts these results to the above parameterization. Estimates of this
σ
2
/(1 φ
2
)=
2
process variance, obtained as ˆσ
α
2
σ
/(1 φ
ˆ
2
), are presented as the nal column to allow
additional comparison between the various model ts.
TABLE 6.1
Estimates and Standard Errors for Key Parameters in Various Methods Applied to the Polio Series
Method (Source) β
ˆ
2
se(β
ˆ
2
) φ
ˆ
se(φ
ˆ
)
σ
ˆ
2
σ
ˆ
α
2
MCEM (Chan and Ledolter, 1995) 4.62 1.38 0.89 0.04 0.09 0.41
MCEM[NL] (McCulloch, 1997) 4.35 1.96 0.10 0.36 0.50 0.51
Bayes (Oh and Lim, 2001) 4.24 1.72 0.66 0.16 0.32 0.56
PQL[NL] (Breslow and Clayton, 1993) 3.46 3.04 0.70 0.13 0.26 0.51
AL (Davis and Rodriguez-Yam, 2005) 3.81 2.77 0.63 0.23 0.29 0.48
AL-BC (Davis and Rodriguez-Yam, 2005) 3.96 2.77 0.73 0.23 0.30 0.65
AIS (Davis and Rodriguez-Yam, 2005) 3.75 2.87 0.66 0.21 0.27 0.48
AIS-BC (Davis and Rodriguez-Yam, 2005) 3.76 2.87 0.73 0.21 0.30 0.64
MCNR (Kuk and Cheng, 1999) 3.82 2.77 0.67 0.18 0.27 0.48
EIS (Jung and Liesenfeld, 2001) 3.61 2.57 0.68 0.15 0.26 0.48
GLM (Davis et al., 2000) 4.80 4.11
GEE (Zeger, 1988) 4.35 2.68 0.82 0.19 0.57
CPL
2
(Davis and Yau, 2011) 4.74 2.54 0.49 0.21 0.37 0.49
IBC[NL] (Kuk, 1995) 5.01 3.20 0.54 0.28 0.35 0.49
2
Note: ˆσ
α
2
σ
/(1 φ
ˆ
2
).
139 State Space Models for Count Time Series
The origin of the method and results when applied to the Polio data set is listed in paren-
theses, and additionally, if the results are from application of the method by Nelson and
Leroux (2006) these are also indicated by an additional annotation ‘NL’. The methods can be
roughly partitioned into three groups. Group 1 consists of two implementations of MCEM
(Monte Carlo EM) and a Bayes procedure. Group 2, which is essentially approximate
likelihood based-methods, consists of PQL (penalized quasilikelihood), AL (approximate
likelihood), AL-BC (bias corrected AL), AIS (approximate importance sampling), AIS-BC
(bias corrected AIS), MCNR (Monte Carlo Newton–Raphson), and EIS (efcient impor-
tance sampling). Note that the rst 3 procedures of this group are nonsimulation based,
while the last 4 involve some level of simulation. Group 3 consists of nonlikelihood-based
procedures: GLM (generalized linear model estimates ignoring the latent process), GEE
(generalized estimating equations), CPL
2
(pairwise composite likelihood), and IBC (itera-
tive bias correction using iterative weighted least squares). We exclude from our review the
few studies that have used alternative response distributions or latent process distributions
for these data so that the methods are compared on the same model.
With the exception of the GLM, GEE, CPL
2
, and Bayesian analyses, all other methods
aim to obtain approximations to the likelihood estimates and their standard errors. Clearly
there are both substantial differences and similarities between the results for various meth-
ods, a point also noted in Nelson and Leroux (2006). We now discuss these differences and
similarities in more detail in an attempt to draw some general conclusions about which
methods may be preferred. Of course, this comparison is only for application to a sin-
gle data set and much more research is required before general conclusions can be drawn.
However, this is the only data set for which all the methods listed have been applied. Unfor-
tunately, simulation evidence comparing the variety of methods is rather limited with the
exception of the results in Nelson and Leroux (2006).
6.3.1 Estimate of Trend Coefficient
β
ˆ
2
The GLM, IBC method as implemented by Nelson and Leroux (2006), and CPL
2
give the
most negative trend estimates. It would appear as if the IBC method is not adjusting the bias
of the GLM estimate sufciently well and this may be a result of iterative weighted least
squares being used as the basis for the bias adjustment simulations. It is likely that these
methods are substantially biased. Amongst the remaining methods, there appear to be two
groups of values for the trend coefcient estimates: Group 1, the values for both implemen-
tations of MCEM and the Bayes t; and, Group 2 based on approximations to the likelihood
with and without importance sampling (PQL, AL, AL-BC, AIS, AIS-BC, MCNR, and EIS).
The concordance in Group 2 is perhaps not surprising since they are all aimed at approx-
imating the likelihood. However, it is surprising that the Group 1 do not agree as closely
with the Group 2 results. Turning to comparison of the estimated standard errors, those for
Group 1 appear to be substantially smaller than those for Group 2, and within this latter
group there is considerable agreement. Also note that the MCEM and Bayes methods are
biasing the point estimates towards larger negative values and biasing the associated esti-
mated standard errors downwards. The net effect of these two biases would be to increase
the ratio of estimate to standard error resulting in a higher chance of concluding that there
is a signicant downward trend in Polio cases over the time period of observation. On the
other hand, for Group 2, these test ratios would all be consistent with a conclusion of no
downward trend.
140 Handbook of Discrete-Valued Time Series
6.3.2 Estimate of Latent Process Parameters
Interestingly, the estimates of overall variance σˆ
2
α
are remarkably similar for all methods
apart from the GEE method, the Bayes method and the bias corrected AL and AIS meth-
ods. This suggests that the likelihood-based methods (including the MCEM methods) are
all nding the same degree of overall variability in the latent process. However, the two
MCEM methods differ substantially in their identication of the source of this latent process
variability. The MCEM method of McCulloch (1997) severely underestimates the autocorre-
lation φ, with corresponding larger values for σˆ
2
when compared with the MCEM method
as implemented by Chan and Ledolter (1995) and the other likelihood approximations. The
reason for this is not clear. However, since the only difference between the two MCEM
methods is that of Gibbs sampler or Metropolis–Hastings, it may be that these are not
exploring the sample space sufciently well when Monte Carlo draws are being generated
2
resulting in what appears to be lack of identiability between andφ σ
Nelson and Leroux (2006) and Skaug (2002) provide some information on comparison of
AL and AIS methods appear to suffer and the CPL method appear to apportion overall
variability to autocorrelation and innovation variance differently than the other likelihood
approximations (AL, AIS, MCNR, and EIS), which are quite consistent with each other.
Incidentally, the use of AD as in Skaug (2002) gives identical results to the AL method
and, because of this, are not recorded in Table 6.1. Further, the use of M = 100, M = 1000,
or M = 5000 importance samples as reported independently by Skaug (2002) has very little
impact on point estimates.
6.3.3 Comparisons of Computational Speed
speeds between some of the methods. However, they mix speeds reported by the origi-
nal authors with those obtained in their applications and simulations acknowledging that
different generation computer processors were used. There is no comprehensive compar-
ison of speeds for the models listed in Table 6.1. However, it is clear that the approximate
likelihood methods are the fastest overall requiring no simulations or Monte Carlo to obtain
estimates.
6.3.4 Some Recommendations
Based on this comparison on a single data set (with all the limitations for generality that
implies):
1. Overall, the use of Laplace approximation to the likelihood results in point esti-
mates and standard errors that are sufciently close to those obtained from the
more accurate importance sample augmented approximations. It would appear
that, for these data at least, importance sampling is not providing much additional
benet to inference.
2. Use of CPL appears to provide biased results and has no clear computational
advantage over the Laplace approximation method.
3. The MCEM method should be avoided until an explanation can be found for the
obvious differences between results from two different implementations (Gibbs
versus Metropolis–Hastings sampling) and for the clear bias in point estimates and
underestimation of standard errors that the method produces.
. The bias adjusted
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset