209 Model Validation and Diagnostics
the next two experiments in this section, we discuss the results from single simulation run
where the sample size is set to 50, 000 in order to limit the impact of sampling uncertainty
on the results. In this case, the generated data are analyzed with two different scenarios:
(a) a GP(1) estimated model corresponding to the true DGP and (b) a PINAR(1) estimated
model where the innovation distribution is erroneously taken to be Poisson and binomial
thinning is assumed.
Figure 9.9 provides three graphs associated with relevant diagnostic tools. Graphs for
neither the ACF of the {u
+
} nor the parametric bootstrap are provided, since they are
t
qualitatively similar to the ACF of the Pearson residuals and provide no added insight.
None of the graphical diagnostics depicted in the top panels of Figure 9.9 suggest that
the GP(1) model is inadequate for the data. However, this is not the case when attention is
focused on the lower row of panels in Figure 9.9. Here, the results of erroneously assuming
equidispersed Poisson innovation distribution (and binomial thinning) are clearly evident.
The distributional misspecication is seen in the U-shaped PIT histogram and an F
m
(u
)
chart (in the bottom row, third column of the gure), which deviates from the 45
line
(compare the corresponding gure in the row above). In addition, the correlogram of the
Pearson residuals indicates misspecication with respect to the dynamics of the generated
data. This result is explained by the fact that the maximum likelihood estimate for the
parameter α
1
in the PINAR(1) model is biased downward. Therefore, the strength of the
dependence in the data is underestimated, resulting in residual serial correlation remain-
ing in the Pearson residuals; this is depicted in the left-hand panel in the lower row of
Figure 9.9.
A summary of numerical results is given in Table 9.3. It can be seen that, in contrast
with that for the correct specication, the sample variance of the Pearson residuals from
the PINAR(1) is considerably larger than one (1.3171), indicating that not all the dispersion
in the generated data has been accounted for in the tted specication. Note also that the
scoring rules and the information criteria of the two tted models uniformly indicate a
preference for the true GP(1) model for the data over the PINAR(1) one. For some of these
statistics, including the variance of the Pearson residuals, both information criteria and the
p-values of G, the evidence is emphatic.
The second experiment specically targets a model misspecication with respect to the
predictive distribution. The data are generated using an INAR(1) of the form (9.2) with
binomial thinning, but with GP innovations in truth. Such a model is discussed by Jung
and Tremayne (2011b), where it is indicated that the resultant counts exhibit marginal
overdispersion, but are not GP, because closure under convolution does not apply.
For this simulation experiment, we employ the same set of parameter values for α
1
,
λ,and η as in the rst experiment. Also, we use the same estimated models as men-
tioned earlier, that is, (1) a GP(1) model based on the random operator R
t
(·) of Joe
(1996) and (2) a PINAR(1) model based on binomial thinning with the parameters esti-
mated being the dependence parameter and the innovation mean based on a Poisson
innovation assumption. Note that neither tted model is correct. Figure 9.10 provides
graphs associated with various diagnostic tools. Summary numerical results are given in
Table 9.4.
Note that both tted models are misspecied, since the rst implies a misspecied thin-
ning mechanism (the marginal distribution of the data will be overdispersed but will not be
GP) and the second assumes an incorrect innovation, because the likelihood is based on a
Poisson assumption for innovations. This is reected at various junctures in the diagnostic
analysis.
210
Handbook of Discrete-Valued Time Series
0.010
ACF Pearson residuals
0.20
PIT histogram
1.0
F
m
(u*)
0.9
0.005
0.15
0.7
0.8
0.6
0.000
0.10
0.5
0.4
–0.005
0.05
0.2
0.3
0.1
–0.010
(a)
0.10
1 4 7 10 14
18
Lags
ACF Pearson residuals
0.00
0.20
1 3 5 7
PIT histogram
9
0.0
1.0
0.1 0.3 0.5
u*
F
m
(u*)
0.7 0.9
0.9
0.05
0.15
0.7
0.8
0.6
0.10
0.5
0.00
0.05
0.2
0.3
0.4
0.1
(b)
–0.05
14 7 10 13
Lags
17
0.00
1 3 5 7 9
0.0
0.1 0.3
u*
0.5 0.7 0.9
FIGURE 9.9
Graphical results for the rst Monte Carlo experiment. (a) GP(1) estimated model and (b) PINAR(1) estimated
model.
211 Model Validation and Diagnostics
TABLE 9.3
Summary of Numerical Results for the First Monte Carlo Experiment
GP(1) Model PINAR(1) Model
Pearson residual
Mean 0.0008 0.0137
Variance 1.0169 1.3171
Scoring rules
logs 1.7347 1.7689
qs 0.7816 0.7896
rps 0.8119 0.8227
AIC 86735.49 88444.71
BIC 86748.72 88452.53
G 11.584 777.8477
p-value (0.2378) (< 0.000)
Starting with the second set of results related to the PINAR(1) estimated model, we
see that, due to a downward bias in the ML estimation of the dependence parameter,
there are obvious unwanted spikes in the correlogram of the Pearson residuals. Both the
(nonrandomized) PIT histogram and the F
m
(u
) chart indicate some misspecication in the
distributional assumption. From Table 9.4, it can be seen that the variance of the Pearson
residuals (at 1.2547) is considerably larger than unity and the G-statistic decisively rejects
uniformity of the PIT histogram.
Interpreting the rst set of results related to the GP(1) estimated specication, where the
misspecication is essentially due to the thinning operator assumed, is less obvious. We
reiterate (Jung and Tremayne 2011b) that the degree of overdispersion in the innovations is
attenuated in the true marginal distribution of the observations by the binomial thinning
operation used in the data-generating mechanism. The estimated GP(1) model is able to
capture the dependence structure in the data, reected by a Pearson residual correlogram
that shows the dependence structure in the data to be adequately modeled (top row, rst
column of the gure). Also, the variance of the Pearson residuals from the estimated model
is larger than one, but only marginally so. Diagnostic results related to other aspects of the
specication do tentatively suggest model misspecication in that the (nonrandomized)
PIT histogram and the F
m
(u
) chart exhibit limited unwanted features. In particular, the
former shows some departure from uniformity, a conclusion backed up by the goodness-
of-t statistic G and its associated p-value. Overall, the results displayed in Figure 9.10 and
Table 9.4 suggest a preference for the GP(1) specication over the PINAR(1) one, but there
may be doubt about whether or not the former is fully data coherent.
The third experiment is designed to reect underspecication of dynamics in the esti-
mated model. Data is generated using a second-order integer autoregressive model with
GP innovations and the operator due to Joe (1996); by closure under convolution, the
marginal distribution of the generated counts is GP. See the discussion of the previous sub-
section relating to the preferred model for the iceberg data set for further information on
this specication. The following set of parameters are used in the experiment for the depen-
dence parameter vector α, α
1
= 0.4; α
2
= 0.25; α
3
= 0.1; λ = 0.5; and η = 0.2 leading to a
process mean of 2.5 and rst- and second-order autocorrelations of 0.45 + 0.1 = 0.55 and
0.25 + 0.1 = 0.35, respectively.
212
Handbook of Discrete-Valued Time Series
ACF Pearson residuals PIT histogram
F
m
(u*)
0.025
0.20
1.0
0.020
0.9
0.015
0.8
0.15
0.010
0.7
0.005
0.6
0.000
0.10
0.5
–0.005
0.4
–0.010
0.3
0.05
–0.015
0.2
–0.020
0.1
–0.025
0.00
0.0
1
3 5 7 9 12 15 1 3 5 7 9 0.1 0.3 0.5 0.7 0.9
(a) Lags
u*
ACF Pearson residuals PIT histogram
F
m
(u*)
0.10
0.20
1.0
0.9
0.8
0.15
0.7
0.05
0.6
0.10
0.5
0.4
0.00
0.3
0.05
0.2
0.1
–0.05
0.00
0.0
135 7911 14 13 5 7 9 0.1
0.3 0.5 0.7 0.9
(b) Lags
u*
FIGURE 9.10
Graphical results for the second Monte Carlo experiment. (a) GP(1) estimated model and (b) PINAR(1) estimated
model.
213 Model Validation and Diagnostics
TABLE 9.4
Summary Numerical Results for the Second Monte Carlo Experiment
GP(1) Model PINAR(1) Model
Pearson residual
Mean 0.0018 0.0090
Variance 1.0534 1.2547
Scoring rules
logs 1.6927 1.7111
qs 0.7724 0.7768
rps 0.7661 0.7704
AIC 84631.63 85550.68
BIC 84644.86 85563.91
G 244.089 611.650
p-value (< 0.000) (< 0.000)
Two different models are tted to the generated data: (1) a GP(2) (correct) estimated
model and (2) a misspecied GP(1) model, so both estimated models utilize the thinning
operator of Joe (1996). Figure 9.11 displays graphs associated with some of the diag-
nostic tools discussed for the latter case only, since the tted GP(2) model evidences
no misspecication. Summary numerical results for both estimated models are given in
Table 9.5.
It is evident from the (nonrandomized) PIT histogram and the F
m
(u
) chart that the
estimated GP(1) model is able to capture the distributional assumption correctly. How-
ever, the correlogram of the Pearson residuals indicates misspecied dynamics in this
underspecied tted model. In particular, it shows Pearson residual autocorrelations of
the GP(1) model that decay exponentially (after the third). This arises because the auto-
correlations of the data themselves exhibit a more complicated persistence pattern than a
rst-order model can account for. From the numerical results displayed in Table 9.5, it can
be seen, as it is to be expected, that all the scoring rules and the information criteria clearly
favor the GP(2) estimated model over the rst-order counterpart. Note, however, that the
summary statistics relating to the Pearson residuals and the G-statistic do not indicate
model misspecication.
Finally, we conduct a fourth experiment by generating data from a second-order inte-
ger autoregressive model with Poisson innovations. Data is again generated using the
Joe (1996) thinning operator and is of the form (9.1) with p = 2. Instead of using a con-
stant innovation rate as in the earlier experiments, we employ a time varying innovation
designed to capture (deterministic) seasonality often employed in empirical work. Suppose
the innovation rate λ
t
to be time varying using harmonics given by

2πt 2πt
λ
t
= exp θ
1
+ θ
2
sin
+ θ
3
cos
(9.12)
200 200
and choose the following set of parameters: α
1
= 0.4, α
2
= 0.25, α
3
= 0.1, θ
1
= 0.05,
θ
2
=−0.2, and θ
3
= 0.2. The harmonics introduce additional dynamic effects in the
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset