9: Model Validation and Diagnostics (4/6)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

204 Handbook of Discrete-Valued Time Series

a multiplicity of different model specications for the same data set and it is to issues of

this nature that we now turn.

9.4.2 Scoring Rules and Model Selection

The relative performance of a model within a given group of competing models can be

assessed using scoring rules. Scoring rules, or functions, are regularly used in decision

analysis to measure the quality of probabilistic predictions by assigning a numerical score

based on the predictive distribution and the observed data. They are closely related to (gen-

eralized) entropy measures; see, for example, Jose et al. (2008). An important property of a

scoring rule is propriety. A scoring rule is said to be proper if a forecaster achieves their best

score by predicting according to their true belief about the predictive distribution. A for-

mal denition of this concept and further discussion can be found in Gneiting and Raftery

(2007). Boero et al. (2011) provide a comprehensive evaluation of scoring rules along with

some historical background, and Czado et al. (2009) offer an account of scoring rules in the

context of count data.

In the present framework, scoring rules are used as a model selection tool and are

computed as averages over the relevant set of (in-sample) predictions, say (T − p)

−1



= p + 1

s [F(x

)], where s [·] denotes a generic scoring rule and observed count x

and F(x

)

is dened in the text following (9.10). Scoring rules are, generally, negatively oriented

penalties that one seeks to minimize. The literature has developed a large number of scor-

ing rules and, unless there is a unique and clearly dened underlying decision problem,

there is no automatic choice of a (proper) scoring rule to be used in any given situation.

Therefore, the use of a variety of scoring rules may be appropriate to take advantage of

specic emphases and strengths.

We have found three proper scoring rules to be particularly useful in comparing time

series models for counts, and these are now introduced as scores per observation to be

aggregated as indicated in the previous paragraph. The rst scoring rule we consider is the

logarithmic score. It is dened as the negative of the logarithm of the predictive distribution

evaluated at the observed count, and it is closely related to the classical Shannon entropy

logs(F(x

t−1

)) =− log p(x

t−1

where p(x

t−1

) is the probability mass of the predictive distribution at the observed count.

In contrast to the other scoring rules discussed in the following text, the logarithmic score is

what is called a local scoring rule in that it provides a small value if the observed count is in

the high-density region of the predictive distribution and large values otherwise. Geweke

and Amisano (2011) have a careful analysis of the properties of weighted linear combina-

tions of prediction models and base their model choice procedure on the minimization of

the logarithmic score.

The quadratic score has been specically proposed in the assessment of time series

predictions of counts. It involves an augmentation of the information contained in the log-

arithmic score by a summary measure from all probability ordinates, denoted by ||p||



∞

j=0

p(j)

, where p(j) represents the probability that x

= j in the probability mass function

of the predictive distribution and is given by

qs(F(x

t−1

)) =−2p(x

t−1

) +||p||

205 Model Validation and Diagnostics

The quadratic score was proposed by Wecker (1989) in the specic context of predictions

for time series of counts.

The nal scoring measure we consider is the ranked probability score dened by

∞

rps(F(x

t−1

)) =



[F(j) − 1(x

≤ j)]

j=0

where 1(·) is an indicator function. This rule assesses the sum of squared differences of

the cumulative probabilities using the modeled conditional distribution from the observa-

tions. Hence, it penalizes more severely when the predictions are far from the observed

outcomes.

Some authors (including Weiß 2009 and Zhu 2011) proposed the use of information crite-

ria, such as the popular Akaike information criterion (AIC), as means of choosing between

nonnested time series models for counts, despite the fact that little is known about their

ability to do so in this framework. In addition, Psaradakis et al. (2009) examined the abil-

ity of some popular information criteria, like AIC, the Bayesian information criterion (BIC)

and the Hannan and Quinn criterion (HQ) to distinguish between some nonlinear times

series models. They argued that all three criteria have a useful role to play in a time series

model selection exercise. Although their study was not based on count time series models

directly, it may serve as some justication for using such model selection devices in the

present context.

Scoring rules and values for two information criteria for a PINAR(1) model tted to the

cuts data are as follows: logarithmic score 2.4549, quadratic score 0.9001, ranked probability

score 1.5932, AIC = 290.2678, and BIC = 294.4490. For the iceberg order data, the corre-

sponding values are as follows: logarithmic score 1.3014, quadratic score 0.6500, ranked

probability score 0.5122, AIC = 1414.9705, and BIC = 1422.4586. These values are simply

reported here, but will be employed in the following section to facilitate comparison of the

simple PINAR(1) model to others tted to each data set.

9.4.3 Cuts and Iceberg Data Revisited

It will now be very clear that the basic PINAR(1) models tted to the two real life data sets

introduced in Section 9.2 have been revealed to be decient according to a range of criteria.

We are now in a position to reconsider specication of appropriate count time series models

forthesedata.

Consider rst the cuts data. Diagnostic analyses provided in earlier sections after tting

a simple PINAR(1) model to these data reveal a number of difculties. First, while the mean

of the Pearson residuals is close to zero, their variance is considerably larger than unity (at

1.607). Next, the dependence structure in the data is not well captured by the model. This is

evident in the left-hand panels of two gures: from the correlogram of the Pearson residuals

in Figure 9.4 and from the parametric resampling exercise depicted in Figure 9.3. Possible

evidence of distributional misspecication is available in the PIT in the relevant panel of

Figure 9.6. In addition, an analysis of the component residuals of Section 9.3 reveals unac-

counted variation in the data and the graphical evidence in Figure 9.5 may be indicative

of unmodeled seasonal variation. From a pragmatic point of view, given the lower arrivals

p-value for the IM test reported previously and the fact that seasonal arrivals could very

well induce seasonal departures, it seems reasonable to account for this variation by

modifying the arrival process rst.

206 Handbook of Discrete-Valued Time Series

In seeking to remedy the aforementioned deciencies in the PINAR(1) model with no

covariates, we undertake a limited specication search. This leads us to propose tting a

GP(1) model of the form (9.1) with time-varying innovation rate λ

to the data. The resul-

tant tted model is (estimated asymptotic standard errors are given in parentheses below

parameter estimates)

= R

t−1

; 0.478

(0.072)

) +ˆε, where ˆε ∼ GP(

, 0.165

(0.066)

 

and λ

= exp 0.942 −0.216 sin(2πt/12) −0.333 cos(2πt/12) .

(0.190)

(0.106) (0.110)

It can be seen that estimated coefcients relating to seasonal effects are both statistically

different from zero at most conventional signicance levels, as is the dispersion parameter

of the GP distribution (p-value 0.0125). The values for the various scoring rules and the

information criteria are as follows: logarithmic score 2.3252, quadratic score 0.8840, ranked

probability score 1.4645, AIC =277.0928, and BIC =284.0615. All of these are lower than

their counterparts provided toward the end of Section 9.4.2 for the PINAR(1) model with

no covariates. Further summary statistics are as follows: variance of the Pearson residuals:

1.0165 and uniformity test, G, of the PIT histogram: 1.9107 (p-value 0.9928). On the evidence

presented, a researcher would clearly prefer the GP(1) model with deterministic seasonality

to the original PINAR(1) model.

Some diagnostic plots relating to this new model specication are the subject of

Figure 9.7. It is readily seen from all three panels in the gure that there is no evidence of

model misspecication. These should be compared and contrasted with comparable pan-

els in Figures 9.3, 9.4, and 9.6. Hence, a simple change in innovation distribution, together

with allowing a time-varying innovation mean, leads to a marked improvement in the suit-

ability of the (new) model for which the methods discussed do not reveal any statistical

inadequacies.

Turning to the iceberg order data, diagnostic results reported in earlier sections after t-

ting a PINAR(1) model clearly reject this initial model. However, evidence for distributional

misspecication is not as clear-cut as for the cuts data set. The variance of the Pearson resid-

uals is larger than unity at 1.289. But the PIT histogram in the lower left panel of Figure 9.6

displays only limited departure from uniformity and the G-statistic corroborates this by

not rejecting the null of a uniform PIT histogram (p-value = 0.395). A misspecication of

the dependence structure is evident from the right-hand panels of Figures 9.3, 9.4 and the

bottom right panel in Figure 9.6. In contrast to the previous data set, we do not infer that

a seasonal pattern is unaccounted for, as some experimentation (not reported here) shows

no improvement over the basic PINAR(1) model.

A limited specication search (details of which again go unreported to save space) leads

us to propose a model of the form (9.1) with no covariates for the iceberg order data, but

with GP innovations and associated random operator of Joe (1996). The proposed DGP

sets p = 2 and is denoted a GP(2) model. Note that the model cannot be written in the

form (9.2), since the random operator R

t−1

, α) of (9.1) has two lags in F

t−1

,but the

dependence parameter vector α has three elements. By closure under convolution, this

leads to the marginal distribution of the counts being taken to be GP. The resultant tted

model is (estimated asymptotic standard errors again in parentheses)

= R

t−1

; 0.1954 , 0.046 , 0.4671 ) +ˆε, where εˆ ∼ GP(0.3259, 0.1696 ).

(0.0129) (0.0139) (0.0268) (0.0262) (0.0255)

207 Model Validation and Diagnostics

ACF Pearson residuals PIT histogram Parametric bootstrap

0.3

0.20

1.1

0.2

0.9

0.15

0.7

0.1

0.5

–0.0 0.10

0.3

–0.1

0.1

0.05

–0.1

–0.2

–0.3

13 5 7 9

Lags

11 13 15

0.00

12 3 4 6 7 8 10

–0.5

13 5

Lags

7 9 11 13 15

FIGURE 9.7

Diagnostics for a GP(1) model with exogenous variables tted to the cuts data.

All the parameter estimates in α are positive and signicantly different from zero. The

overdispersion parameter η is also signicant (the p-value is zero to four decimal places).

The values for the various scoring rules and information criteria are as follows: logarithmic

score 1.2592, quadratic score 0.6376, ranked probability score 0.5024, AIC = 1370.00, and

BIC = 1382.48. The scoring rules and information criteria, as compared to those reported

in Section 9.4.2, uniformly favor the GP(2) specication over the PINAR(1) model. Further

summary statistics are as follows: variance of the Pearson residuals: 1.0001 and unifor-

mity test of the PIT histogram: 0.9958 (p-value 0.9994). The diagnostic plots are provided

in Figure 9.8. None of the three panels indicate model misspecication.

9.5 Evidence with Artical Data

The evidence on the use of model validation and diagnostic methods provided in Section

9.4.3 relates only to two data sets for which deciencies in a PINAR(1) model can be high-

lighted and, perhaps, rectied by simple model respecications. To further gauge and

illustrate the performance of the diagnostic tools presented in previous sections, we report

208

Handbook of Discrete-Valued Time Series

ACF Pearson residuals PIT histogram Parametric bootstrap

0.3 0.20 1.1

0.9

0.2

0.15

0.7

0.1

0.5

–0.0

0.10

0.3

0.1

–0.1

0.05

–0.2

–0.3

0.00

–0.5

13 5 7 9 12 15 18 1 2 3 4 6 7 8 10

13 5 7 9 12 15 18 21

Lags Lags

FIGURE 9.8

Diagnostics for a GP(2) model tted to the iceberg order data.

the results of some simulation experiments seeking to reect additional common situa-

tions that might be faced by applied workers wishing to assess the adequacy of a tted

model for count time series. The purpose of this section is to report the results of four such

experiments.

The rst experiment aims to analyze the ability of the diagnostic devices, including scor-

ing rules, to detect a misspecication in the distributional assumption of the innovations in

a proposed model. In particular, it reects a common situation in applied work where the

count time series exhibits marginal overdispersion reected by a variance-to-mean ratio

greater than one.

Data are generated from a rst-order integer autoregressive process (9.1) with innova-

tions ε

∼ GP(λ, η) and the random operator R

(·) proposed by Joe (1996); this is the setup

denoted GP(1) in the Section 9.1. The specic form of the GP distribution we employ is

p(ε) = λ(λ + εη)

ε−1

exp(−λ − εη)/ε!,with ε = 0, 1, 2, ..., λ > 0and η ∈[0, 1). Further

details can be found, inter alia, in Jung and Tremayne (2011b).

For the simulation experiment, we use the following parameter values: α

= 0.6; λ = 0.8,

and η = 0.2. This leads to a GP-distributed count time series with (theoretical) mean and

variance of 2.5 and 3.9, respectively, and moderate dependence structure (the τ’th auto-

correlation function ordinate ρ(τ) = 0.6

,for τ = 1, 2, ...). Here, and in conjunction with

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 9: Model Validation and Diagnostics (4/6)

Create new playlist

Sign In

Sign Up

Table of Contents for
9: Model Validation and Diagnostics (4/6)