66 Handbook of Discrete-Valued Time Series
TABLE 3.1
Results for testing various xed effect multiple GLARMA models for the Road Deaths Series in 17
U.S. States
Model 2log L S G
2
d.f. p-val
FE-I: Unrestricted 5345.69 92
FE-II: φ
s in 6 groups
FE-III: BAC, ALR, FS same
FE-IV: BAC, ALR, FS, lnOVD same
5347.87
5391.76
5436.57
81
43
27
G
2
II v I
= 2.18
G
2
III v II
= 43.89
G
2
IV v III
= 44.81
G
2
IV v II
= 88.61
11
38
16
54
0.998
0.236
0.00015
0.0021
We next check whether the regression coefcients in W
jt
vary signicantly between indi-
vidual states. We begin with the overall unrestricted t to all 17 states. We refer to this
as Model FE-I, which has 2 log L = 5345.69 with S = 92 parameters. Examination of
the individual estimates φ
ˆ
12
suggested that they could be simplied as follows: Group 1
(State 11, φ
ˆ
12
=−0.081 ± 0.045), Group 2 (States 1, 6, 7, 9, 10, 12:17, φ
ˆ
12
= 0.005 ± 0.013),
Group 3 (States 2, 4, φ
ˆ
12
= 0.066 ± 0.015), Group 4 (State 5, φ
ˆ
12
= 0.212 ± 0.087), Group 5
(State 8, φ
ˆ
12
= 0.401 ± 0.096), Group 6 (State 5, φ
ˆ
12
= 0.545 ± 0.219) in which, at most, 6
φ
12
coefcients are signicant.
The model with the φ
12
restricted to these groups is referred to as Model FE-II in
Table 3.1. Using the likelihood ratio test we obtain G
2
II v I
= 2.18 on 11 d.f.; hence, restriction
of the φ
12
would not be rejected. From this model, we then examined whether or not some
or all of the regression coefcients (other than the intercept which does vary substantially
between states) take common values across all 17 states. Model FE-III restricts the coef-
cients for BAC, ALR, Friday–Saturday to be the same and (see Table 3.1) G
2
=43.89
III v II
on 38 d.f. and associated p-value of 0.24, which is not sufciently strong evidence to sug-
gest that the impact of these variables differs between individual states in a statistically
signicant way. Next, in Model FE-IV, log OMVD was allowed to differ between states.
Compared with Model FE-III or Model FE-II, this risk control variable is strongly statisti-
cally signicant between states with G
2
IV v III
= 44.81 on 16 d.f. and associated p-value of
0.00015 and G
2
IV v II
= 88.61 on 54 d.f. and associated p-value of 0.0021.
Hence, Model FE-III provides a useful summary of the commonality or otherwise of
regression variable impacts on single vehicle night time road deaths across the 17 states. The
tted parameters and associated standard errors are reported in Table 3.2. The six groups
for φ
12
could be reduced to four by removing the nonsignicant cases of Groups 1 and 2.
We did not pursue this here, preferring to move onto the use of a random effects analy-
sis. The impact of lowering the legal BAC level is estimated to be β
ˆ
2
=−0.072 ± 0.022
conrming the statistical signicance of this association found in Bernat et al. (2004).
The xed effects GLARMA model analysis provides a good starting point for the random
effects GLARMA modeling that we turn to in the next section. In particular, it seems plau-
sible from the results of Table 3.1 that random effects will be needed for the intercept term
and the log OMVD term, but not for BAC, ALR, or Friday–Saturday effects. The parame-
ter values reported for Model FE-III in Table 3.2 can provide useful starting values for the
random effects model tting. For xed effects, we use the point estimates of coefcients for
predictors that are common to all series, while for predictors that vary between series, we
use the mean values of point estimates of the coefcients.
67 Generalized Linear Autoregressive Moving Average Models
TABLE 3.2
Parameter estimates for the random effects model for the Road Deaths Series in 17 U.S. States
RE-IV FE-III RE-III
No GLARMA Multiple GLARMA Multiple GLARMA
Random Effects Fixed Effects Random Effects
Estimate s.e. Estimate s.e. Estimate s.e.
β
0
(intercept) 1.649 0.116 1.801 1.705 0.119
β
1
(BAC change) 0.054 0.022 0.072 0.022 0.060 0.022
β
2
(ALR term) 0.063 0.035 0.011 0.039 0.047 0.037
β
3
(Frid-Sat) 0.032 0.011 0.037 0.011 0.037 0.011
β
4
(logOMVD) 0.395 0.063 0.314 0.367 0.061
Intercept RE s.d. 0.241 0.053 0.209 0.242 0.054
logOMVD RE s.d 0.160 0.066 0.140 0.145 0.072
φ
Gp1
0.066 0.045
φ
Gp2
0.010 0.012
φ
Gp3
0.067 0.015 0.071 0.015
φ
Gp4
0.216 0.080 0.201 0.074
φ
Gp5
0.404 0.091 0.408 0.090
φ
Gp6
0.531 0.213
2loglikelihood 5552.173 5391.1 5505.082
The results labeled RE-IV is that reported in Bernat et al. (2004) using SAS PROC NLMIXED. The results labeled
FE-III is the nal xed effects multiple GLARMA model discussed in Section 3.5.2.
Note: Values reported against the intercept β
0
and the logOMVD term β
4
are averages of the 17 individual
values obtained while the values in the rows labeled “Intercept RE” and “logOMVD RE” are the standard
deviations of these individual estimates, respectively. The results labeled RE-III is the nal random effects
multiple GLARMA model discussed in Section 3.6.3.
3.6 Random Effects Multiple GLARMA Model
3.6.1 Maximum Likelihood Estimation
Let W
jt
be dened as in (3.15), where U
j
are multivariate normal. Let θ = β
(1)
, ...,
β
(J)
, τ
(1)
, ..., τ
(J)
, λ now be the collection of parameters in the GLARMA models and the
random effects parameters. The joint log-likelihood is now
J
l(θ) =
l
j
(β
(j)
, τ
(j)
, λ), (3.21)
j=1
where
l
j
(β
(j)
, τ
(j)
, λ) = log exp(l
j
(β
(j)
, τ
(j)
|u)g
U
(u; (λ))du, (3.22)
R
d
68 Handbook of Discrete-Valued Time Series
and g
U
(u; (λ)) is the multivariate normal density. To proceed further, we parameterize
the covariance matrix as = LL
T
where L is lower triangular and let U
j
= Lζ
j
where ζ
j
are
independent N(0, I
d
).Letλ =vech(L) be the half-vectorisation. With this parameterization,
rewrite W
jt
in (3.5) linearly in terms of λ as
T
W
jt
= x
j
T
,t
β
(j)
+ vech
ζ
j
r
j
T
,t
λ + Z
jt
. (3.23)
The log-likelihood (3.22) becomes
l
j
(β
(j)
, τ
(j)
, λ) = log exp
l
j
β
(j)
, τ
(j)
, λ|ζ
g(ζ)dζ (3.24)
R
d
where g(ζ) is the d-fold product of the standard normal density and
β
(j)
, τ
(j)
l
j
, λ|ζ =
y
jt
W
jt
a
t
b(W
jt
) + c(y
jt
).
t=1 t=1
Note that (3.23) is in the same form as (3.3) but the parameters λ are treated as regression
parameters for any xed value of the vector ζ and the random effects covariates r
T
j,t
.
The representation of the random effects covariance matrix as = LL
T
allows the
parameter λ to enter into the conditional log-likelihood linearly and without bounding con-
straints. Both properties enable existing GLARMA software to calculate the log-likelihood
and derivatives with respect to the parameters. When some elements of , and hence L,are
specied as zero to reect zero covariance between some of the random effects, λ is the half
vectorization of L with the structural zeros removed. Covariance matrices in which certain
combinations of random effects are specied to be zero cannot be represented in this form.
However, these can often be accommodated by reordering the random effect variables and
setting the appropriate elements of L to zero.
3.6.2 Laplace Approximation and Adaptive Gaussian Quadrature
For any xed θ, computation of the log-likelihood l(θ) requires calculation of the J inte-
grals dened in (3.24). We now outline an approximate method based on the Laplace
approximation and adaptive Gaussian quadrature (AGQ). The integral in (3.24) can be
rewritten as
1
L
j
(θ) =
(2π)
d/2
exp
F
j
(ζ|θ)
dζ
R
d
where the exponent is considered as a function of ζ for xed parameters θ and is
dened as
n
j
ζ
ζ
F
j
(ζ|θ) =
y
jt
W
jt
(ζ; x
jt
, θ)) a
jt
b(W
jt
(ζ; x
jt
, θ)) + c(y
jt
)
, (3.25)
2
t=1
n n
Generalized Linear Autoregressive Moving Average Models 69
where
W
jt
(ζ; x
jt
, θ) =
r
j
T
,t
L
ζ + x
j
T
,t
β
(j)
+ Z
jt
(3.26)
is treated as a function of ζ for x
T
j,t
β
(j)
xed. To nd the Laplace approximation, we expand
the exponent F(ζ) around its modal value in a second-order Taylor series, and ignore the
remainder. The resulting integral can be obtained in closed form. Note that Z
jt
in (3.26)
is a function of ζ. Hence, the contribution to the rst and second derivatives from the
summation term in (3.25) required for the Taylor series expansion of F
j
need to be calcu-
lated using the GLARMA software with ζ treated as a regression parameter for covariates
r
T
j,t
L
and xing x
T
j,t
β
(j)
as the offset term. To nd the modal value, we need to nd ζ
j
which solves
F
j
(ζ
j
) = 0.
ζ
j
The Newton–Raphson method is used to nd ζ
j
and, at convergence, we set
1
j
=−
ζ
2
ζ
T
F
j
(ζ
j
) .
Since
ζ
2
ζ
T
F
j
(ζ
j
) is almost surely positive denite for the canonical link exponential family,
the Newton–Raphson method will converge to the modal solution from any starting point;
we use ζ
(0)
= 0 to intitiate the recursions.
j
The Laplace approximation gives the approximate log-likelihood for the jth state as
˜
l
(
j
1)
(θ) = log det(
j
(θ))
1/2
+ F
j
(ζ
j
(θ)), which can be combined to give the overall
approximate log-likelihood as
J
˜
l
(1)
(θ) =
˜
l
(
j
1)
(θ) (3.27)
j=1
AGQ methods can be used to improve the approximation as has been successfully done for
likelihoods in other statistical models such as nonlinear and non-Gaussian mixed effects
modeling. This approach is implemented in a number of widely used software systems
as the default method—see Pinheiro and Bates (1995) and Pinheiro and Chao (2006) for
examples. Our implementation of AGQ follows that of Pinheiro and Chao (2006). It relies on
the mode, ζ
j
,and
j
used in the Laplace approximation to center and scale Q quadrature
points in each of d coordinates resulting in integrands evaluated at d
Q
points. When Q = 1,
the Laplace approximation is obtained.
70 Handbook of Discrete-Valued Time Series
The AGQ approximation to the jth integral is denoted by
L
˜
(
j
Q)
(θ), with corresponding
approximation to the overall likelihood as
J
˜
l
(Q)
(θ) = log
L
˜
(
j
Q)
(θ). (3.28)
j=1
Since z
j
(θ) and
j
(θ) are functions of the unknown parameters θ, it is necessary to
recompute the Laplace approximation at each iterate of θ to maximize (3.28).
Maximizing (3.28) using the optimizer optim in R proved to be very slow and unreli-
able for our applications. An alternative was to use Fisher Scoring or Newton–Raphson
updates based on numerical derivatives obtained using the R package numDeriv.This
also proved to be very slow. Analytical derivatives require implicit differentiation of ζ
j
(θ)
and
j
(θ) which results in complex expressions requiring substantial modication to the
current GLARMA software. We next describe an alternative approach that avoids all of
these issues.
First derivatives of the log-likelihood (3.19) with respect to unknown parameters are
J
1
˙
l
j
(θ) =
θ
l(θ) =
j=1
L
j
(θ)
R
d
θ
l
j
(θ|ζ
exp(l
j
(θ|z)g(ζ)dζ. (3.29)
¨
Second derivatives,
l
j
(θ), are also easy to derive and involve more integrals to be approx-
imated. For any xed ζ, the integrands in these derivative expressions can be calculated
recursively using the unpackaged form of single series GLARMA software. If S denotes the
number of parameters in θ, then there are J×(1+S+2S(S+1)/2) = J(1+S)
2
, d-dimensional
integrals to calculate in order to implement the Newton–Raphson method. For instance, for
the nal model for the BAC example (Model RE-III) with two uncorrelated random effects,
we have J = 17, S = 10 requiring calculation of 2057 d = 2 dimensional integrals at each
step of the Newton–Raphson iterations. Fisher scoring is not available here because the
summation to compute the whole likelihood is over J; hence, insufcient outer products
of rst derivative vectors would result in an ill-conditioned approximation to the second
derivative matrix unless J is quite large.
In our experience, for long longitudinal data applications, the Laplace approximation
can provide quite accurate single-point approximations to the integrals required for the
likelihood itself. However, the rst and second derivatives have integrands that are cer-
tainly not positive, nor are they unimodal, and so a single-point integral approximation
is inadequate. However, AGQ can provide multipoint approximations for the integrals
required for derivatives. In our experience, surprisingly few quadrature points are required
to get approximations to the likelihood and the rst and second derivatives which are suf-
ciently accurate for convergence to the optimum of the likelihood and which provide
accurate standard errors for inferential purposes. We denote the estimates of
˙
l(θ) and
¨
l(θ)
¨
obtained by applying AGQ with Q nodes by
˜
˙
l
(Q)
(θ) and
˜
l
(Q)
(θ), respectively. The same
quadrature points and weights that are used for
˜
l
(Q)
(θ) are also used to obtain
˙
l
(Q)
(θ) and
¨
l
(Q)
(θ) using one pass of the GLARMA software.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset