9
Model Validation and Diagnostics
Robert C. Jung, Brendan P.M. McCabe, and A.R. Tremayne
CONTENTS
9.1 Introduction...................................................................................189
9.2 Parametric Resampling......................................................................191
9.3 Residual Analysis.............................................................................195
9.3.1 Pearson Residuals....................................................................196
9.3.2 Component Residuals................................................................197
9.3.3 Overdispersion and the Information Matrix Test................................199
9.4 Analyses Based on the Predictive Distributions..........................................200
9.4.1 PIT Histogramsfor Discrete Data..................................................201
9.4.2 Scoring Rules and Model Selection................................................204
9.4.3 Cuts and Iceberg Data Revisited...................................................205
9.5 Evidencewith ArticalData................................................................207
9.6 Conclusions....................................................................................217
References............................................................................................217
9.1 Introduction
Checking the adequacy of a specied model is an important part of any iterative modeling
exercise in applied time series analysis. For linear Gaussian time series models, or those
based on the framework of generalized linear models, there exist well-developed tools for
this purpose that are readily available and routinely employed in applied work. However,
for nonlinear time series models for discrete data, this is not the case. Nevertheless, the
need to compare two or more competing model specications, or evaluate the adequacy of
t of a chosen model, is obvious.
To help address this gap, we suggest a range of diagnostic and model validation meth-
ods designed to lead to data coherent models that achieve good probabilistic forecasting
outcomes. We borrow from the associated literature developed mainly for continuous vari-
ables, adapting them where necessary for the discrete context. This leads us to advocate a
set of graphical tools and other calibration methods of various kinds.
However, achieving the desired aim may not be straightforward, as the following quote
taken from the important paper of Tsay (1992, p. 2) indicates However, it is well known that
the best model with respect to one checking criterion may fare badly with respect to another criterion.
... Consequently, there is a need to specify the objective of data analysis before choosing a checking
criterion....Without mentioning objectives, reported model checking statistics are meaningless or
189
190 Handbook of Discrete-Valued Time Series
could be misleading. As suggested earlier, our standpoint is that we consider the model class
to be introduced next as primarily of use in a probabilistic forecasting sense, including a
need to provide not only point forecasts but also good estimates of entire forecast distribu-
tions. Hence, we focus our coverage on methods that may help to achieve good outcomes
in this respect.
We now briey introduce the class of integer autoregressive models that will be used as
a vehicle to demonstrate the application of the diagnostic tools described in the chapter.
When used for other model classes presented in this volume, appropriate adaptations may
be necessary. An integer autoregressive process {X
t
; t = 0, ±1, ±2, ...} of order p dened on
the state space of nonnegative integers is of the form
X
t
= R
t
(F
t1
; α) + ε
t
, (9.1)
where F
t1
indicates the relevant past history of X
t
to be conditioned on, typically
X
t1
, ..., X
tp
in a pth-order model, and ε
t
are a sequence of i.i.d. discrete random vari-
ables. The innovation process ε
t
and F
t1
are presumed to be stochastically independent
for all points in time. This model specication is inspired by the work of Joe (1996), from
which it follows, inter alia, that it is often a Markov chain (of some order).
In (9.1), R
t
(·) denotes a random operator to be applied at time t (which may differ from
specication to specication) that carries the dependence structure and preserves the inte-
ger nature of the process. Some practical examples of these random operators are given
in the following. Perhaps unsurprisingly, alternative choices of the operator R
t
(·) and the
innovations ε
t
lead to a rich class of models, see, for example, the survey by McKenzie
(2003). A variant of (9.1) that is popular in the literature is the following:
X
t
= α
1
X
t1
+ α
2
X
t2
+···+α
p
X
tp
+ ε
t
(9.2)
where, conditional on X
tk
, α
k
X
tk
is an integer-valued random variable (using oper-
ator ) with parameter α
k
(possibly a vector). The conditional variables α
k
X
tk
, k
{1, ..., p} are mutually independent and independent of the i.i.d. innovations sequence ε
t
.
The operator thus delivers an integer value, and dependence in X
t
is induced via the con-
ditioning variables X
tk
, k ∈{1, ..., p}. The operator used in α
k
X
tk
may correspond to
binomial thinning and ε
t
to a Poisson variable with parameter λ. Then the conditional vari-
ables α
k
X
tk
, k ∈{1, ..., p} have independent binomial distributions with parameters α
k
and X
tk
. Another possibility is that, conditional on X
tk
, α
k
X
tk
is beta-binomial, while
ε
t
is negative binomial. For all these pth order model variants of the form (9.2), the acronym
INAR(p) has been introduced.
The special case of (9.1) when p =1 is of importance. Under binomial thinning and
Poisson innovations, X
t
has a Poisson marginal distribution because closure under con-
volution applies. This is probably the workhorse model of integer time series modeling.
As it can be written in the form of (9.2), it will henceforth be denoted a PINAR(1) model.
If, however, ε
t
were to be generalized Poisson (GP) random variables, then, to preserve a
GP marginal distribution for X
t
, the random operator R
t
(·), conditional on X
t1
, would
yield a quasi-binomial distribution; such a model will be denoted GP(1) in the follow-
ing. Further, if ε
t
is negative binomial and, conditional on X
t1
, R
t
(·) is beta-binomial,
then X
t
also has a negative binomial marginal distribution (see, e.g., McKenzie, 2003,
p. 586).
191 Model Validation and Diagnostics
In what follows we use the PINAR(1) as a rst specication to t to two real life data
sets and consider at various subsequent points in the chapter the evidence that this simple
specication needs to be elaborated. We seek to reveal data coherent models for each data
set in the light of our diagnostic analyses.
Of course, there are a number of avenues that can be used to assess the evidence against
the suitability of a specied model. Issues to be considered include (but would not neces-
sarily be limited to) the type of random operator R
t
(·) chosen, relevant past history F
t1
,
the distributional properties of ε
t
, and the need to introduce regression effects in some way.
Evidently, the third of these might be obviated by using the semiparametric approach of
McCabe et al. (2011), but this can introduce added complications when looking at the last
and so we do not consider this approach further in our contribution.
The plan of the chapter is as follows. Sections 9.2 through 9.4 provide a description of the
diagnostic methods surveyed together with the results of their application to two real data
sets. In Section 9.5, we use simulated data to highlight specic properties of the various
methods discussed. Finally, Section 9.6 contains concluding remarks.
9.2 Parametric Resampling
A very general informal approach to model diagnostics for time series is proposed by Tsay
(1992). He demonstrates the procedure by employing the sample spectral density function
of any process as a functional of interest. This is closely related to the (sample) autocorre-
lation function, (S)ACF, to be used here, since it is a cosine transformation of the spectrum.
The exibility of Tsay’s approach stems from the fact that it not only provides an overall
evaluation of the tted model but also can be tailored to meet certain specic needs of the
analysis. The procedure is widely applicable and rests on a fairly minimal set of require-
ments. Although bootstrap methods are ubiquitous, the caveat that they often do depend
on asymptotic theory (and sometimes on distributional assumptions) is in order. In our
context, the approach emphasizes reproducibility in tted models and is designed to pro-
vide overall evaluation of t or to check special characteristics of a process. Moreover, the
approach can be readily applied to time series models of counts as it is straightforward
to implement the data-generating process (DGP) of most of them in standard software
packages.
Only the following requirements need to be fullled for the implementation of Tsay’s
proposal: a parametric model of mathematical form with given parameters and a specied
distribution for innovations; and one, or more, characteristics or functionals that encap-
sulate special features of interest. No further restrictions, other than that the model can
be used to generate bootstrap samples, apply. Based on articially generated sample pro-
cesses, an empirical distribution of the specied functional (in our case, ordinates of sample
autocorrelation functions) is obtained. The adequacy of a tted model is then assessed by
comparing this empirical distribution to the corresponding functional quantity of the data
itself. A model may be regarded as adequate if it successfully reproduces the observed
characteristics of the actual data. Specically, for each xed lag of the autocorrelation func-
tion, the 100(1α/2) and 100(α/2) quantiles (we use α = 0.05 for graphical displays in what
follows) can be computed to constitute the bounds of an acceptance envelope. If the sample
autocorrelations of the data predominantly lie within the acceptance envelopes, the tted
model can be deemed adequate according to the functional chosen. Notice that this is not
192 Handbook of Discrete-Valued Time Series
an interval estimation procedure as such, so one cannot reason that such an envelope will
contain the true value of any functional 100(1 α)% of the time in repeated applications;
see Tsay (1992, Sec. 2.2) for related discussion.
As we shall use this parametric resampling procedure regularly in this chapter as a tool
to assess a tted model’s adequacy, it seems appropriate to rst examine how the proce-
dure operates in a stylized setting. We, therefore, conduct pilot Monte Carlo experiments
in which the model tted to articially generated data is a PINAR(1). The data itself are
generated in two ways: rst, when the true model is tted, that is, the data itself fol-
low a PINAR(1) process in truth; and, second, when the true DGP is an INAR(2) of the
form (9.2) with Poisson innovations. In the former case, the mean and variance of the
marginal distribution of the data are equal and the autocorrelation function is the same as
that of the Gaussian AR(1) continuous counterpart. In the second case, the true marginal
distribution of the data is not Poisson, there is some overdispersion, and the true auto-
correlation function of the process is equivalent to that of a Gaussian AR(2) process. We
anticipate that application of the Tsay procedure under the rst scenario will indicate no
model misspecication and the contrary under the second.
The functionals that we use in this illustrative experiment are as follows: the variance
and the rst four ordinates of the autocorrelation function. Articial data are generated
from the two specications and the relevant sample functionals, the sample variance, and
the rst- through fourth-order sample autocorrelations, denoted SACF(1)–SACF(4) in the
following, are calculated for the generated data. A PINAR(1) model is tted by maximum
likelihood (ML) and, using the resultant parameter estimates, B bootstrap samples are gen-
erated and the same sample functionals computed. We determine the percentage of times
the functionals of the data are covered by a 100(1α) probability interval (for α = 0.01, 0.05,
and 0.1) constructed from the bootstrap replicates of the resampling procedure. This par-
allels the procedure described by Tsay (1992, p. 4), and is repeated R times to provide an
indication of the performance of the procedure.
In the rst experiment, data series of length T = 500 are generated from a PINAR(1)
model with the following parameter values: α
1
= 0.8 and λ = 0.4. This leads to Poisson
distributed count time series with (theoretical) mean and variance of 2 and rst four
autocorrelations 0.8, 0.64, 0.51, and 0.8
4
= 0.41, respectively. The results presented here
are based on R = 1000 replications, and for each generated series, we perform the para-
metric resampling procedure as described in the previous paragraph using B = 5000
replications.
To provide some information on the sampling variability that can be expected when the
true model is tted to the data, we present the average quantiles for the sample function-
als over the 1000 replications for the rst experiment in the upper panel of Table 9.1. From
this, it is evident that, on average, the sampling distributions of the functionals are centered
quite close to the true values used to generate the data. The lower panel of Table 9.1 shows
what happens if the sample size is varied from T = 500 to T = 250. Broadly, increased sam-
pling variability of the anticipated type is seen in the average newly estimated quantiles.
The left panel of Table 9.2 provides the percentages with which the functionals of the
data are covered by the three acceptance bounds used in this experiment and a correct
model is tted. It is evident that, in all cases, these percentages show that sample function-
als outside the envelopes occur less often than might be expected. We conducted a further
experiment to vary the dependence in the generated process (using α
1
= 0.5 and λ = 1) to
see if the results were sensitive to this variation, but they were not. The results indicate that
the Tsay procedure will generally conrm a correctly specied model.
193 Model Validation and Diagnostics
TABLE 9.1
Average Quantiles from R = 1000 Replications for the Monte Carlo Experiment for the Tsay
Resampling Procedure When a True PINAR(1) Model is Fitted for T = 500 (Upper Panel) and
T = 250 (Lower Panel)
Quantile (%) 0.5 2.5 5 50 95 97.5 99.5
Functional
T = 500
Sample variance 1.286 1.417 1.489 1.935 2.537 2.675 2.990
SACF(1) 0.703 0.726 0.737 0.791 0.836 0.844 0.859
SACF(2) 0.482 0.518 0.536 0.624 0.702 0.716 0.742
SACF(3) 0.315 0.359 0.381 0.492 0.549 0.612 0.647
SACF(4) 0.188 0.237 0.261 0.387 0.506 0.527 0.569
T = 250
Sample variance 1.074 1.228 1.316 1.895 2.767 2.980 3.451
SACF(1) 0.652 0.687 0.704 0.782 0.845 0.856 0.874
SACF(2) 0.403 0.457 0.483 0.610 0.718 0.736 0.770
SACF(3) 0.221 0.285 0.316 0.474 0.614 0.639 0.684
SACF(4) 0.088 0.156 0.190 0.367 0.530 0.559 0.613
TABLE 9.2
Inclusion Rates of Sample Functionals from R = 1000 Replications for the Tsay Resampling
Procedure When a PINAR(1) Model Is Fitted and the True DGP Includes PINAR(1) (Left Panel) and
INAR(2) (Right Panel)
Acceptance Bounds 90% 95% 99%
Functional
Sample variance 95.20 98.40 99.80
SACF(1) 93.60 98.60 99.90
SACF(2) 93.80 97.60 99.40
SACF(3) 93.20 96.10 99.30
SACF(4) 92.30 96.30 99.10
90% 95% 99%
10.30
23.60
0.00
0.00
0.00
15.30
36.60
0.00
0.20
0.10
30.00
61.00
0.00
0.50
0.80
On the other hand, when an inadequate model is tted (refer to the right-hand panel
Table 9.2), all functionals are able to indicate this on a regular basis. These results are
obtained by using an INAR(2) data-generating mechanism with α
1
= 0.45, α
2
= 0.35, and
λ = 0.4. This generates data that have a true mean of 2, variance = 2.95, and rst four auto-
correlation ordinates equal to 0.692, 0.662, 0.540, and 0.475, respectively. Again, T = 500,
R = 1000, and B = 5000 are used. Thus, the Tsay procedure does show an ability to detect
an incorrectly tted model, in this pilot experiment at least. In any instance where it indi-
cates a model’s inadequacy, a search for a more rened (or different) model specication
should be undertaken.
Jung and Tremayne (2011b) applied the method previously with integer time series
(though without any examination of its empirical performance, limited evidence on which
is provided earlier). Grunwald et al. (1997) report that the procedure is able to discover
some surprising results in the context of Bayesian time series models that would have not
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset