9: Model Validation and Diagnostics (2/6)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

194

Handbook of Discrete-Valued Time Series

been detected on the basis of a set of standard residual diagnostics. In addition, Pavlopou-

los and Karlis (2008) apply the method by using tail probabilities and log-moments as their

functionals of interest. The usefulness of the parametric bootstrap approach is particularly

highlighted here, as formulae for both characteristics are generally unavailable for count

time series models.

We now introduce two real-life data sets to be employed for illustrative purposes at

various points in the chapter. An initial (PINAR(1)) model is tted to each, and the Tsay

procedure of this section is then used to assess a model’s suitability.

The rst one consists of 120 counts of claimants collecting wage loss benet for injuries in

the workplace at one specic service delivery location of the Workers Compensation Board

of British Columbia, Canada. Only injuries due to cuts and lacerations are considered; we

refer to these data as the cuts data. The data set is monthly and covers the period from

January 1985 to December 1994. It has been analyzed previously by Freeland and McCabe

(2004), for example. The sample mean and variance of the data are 6.133 and 11.797, respec-

tively, and Figure 9.1 provides a time series plot together with a summary of sample serial

correlation and partial correlation properties. Note that the latter should be thought of as a

heuristic for count data, because, to the best of our knowledge, theoretical results for partial

autocorrelations with non-Gaussian data have not yet been established.

Our second exemplar data set consists of counts of iceberg orders in the order book on the

ask side of Deutsche Telekom stock sampled every 15 min on the XETRA system operated

by the Deutsche Börse. Iceberg orders constitute a particular type of order in many limit

order markets. Their nomenclature is derived from the fact that only a small part of the

order (the tip of the iceberg) is visible in the order book, while the reminder of the order

is hidden. Iceberg order data on different stocks from the one considered here have been

studied elsewhere, for example, by Jung and Tremayne (2011b) and McCabe et al. (2011).

The data set analyzed relates to 32 trading days in the rst quarter of 2004. With 8.5 h of

trading per day (9 am to 5:30 pm), there are 34 observations per day giving a count time

series of 1088 observations. The sample mean and variance are, respectively, 1.39 and 2.08.

Figure 9.2 depicts the raw data and its serial correlation structure. These data, which we

refer to as the iceberg order data, evidence a dynamic structure that decays more slowly

than the cuts data.

0.7

0.5

SACF

0.3

0.1

–0.1

–0.3

2 3 4 5 6 7 8 9 10 11 12 13 14 15

Count

Lags

SPACF

0.7

0.5

0.3

0.1

–0.1

10 20 30 40 50 60 70 80 90 100 110 120

–0.3

12 3 4 5 6 7 8 910 11 12 13 14 15

(a) Time (b)

Lags

FIGURE 9.1

Time series plot (a) and SACF/SPACF plots (b) of the cuts data.

195

Model Validation and Diagnostics

0.8

SACF

0.6

0.4

0.2

0.0

–0.2

1 2 34 5 6 78 9 10

12 14 16 18 20

Count

Lags

0.8

SPACF

0.6

0.4

0.2

0.0

170

270

370

470

570

670

770

870

970

1080

–0.2

1 2 34 5 6 78 9 10 12 14 16 18 20

(a) Time (b)

Lags

FIGURE 9.2

Time series plot (a) and SACF/SPACF plots (b) of the iceberg order data.

1.0

0.8

0.6

0.4

0.2

–0.0

–0.2

–0.4

01 2 3 456 7 8 910111213141516

123456789 11 13 15 17 19 21

(a) Lags

(b) Lags

FIGURE 9.3

Parametric resampling diagnostics after tting a PINAR(1) model to the cuts data (a) and the iceberg order data

(b). The 95% acceptance bounds are shown as “+” symbols and the SACF ordinates by “•.”

Figure 9.3 provides evidence of the usefulness of the parametric resampling method

described earlier and displays relevant graphs after a prototypical PINAR(1) model has

been tted to each data set. The left panel refers to the cuts data and the right panel to

the iceberg order data. Both indicate that neither tted model adequately captures the

dynamics in the respective data set, situations that we shall seek to remedy in due course.

9.3 Residual Analysis

Diagnostic checks based on model residuals have a long tradition in time series analysis;

see, inter alia, discussion in the seminal work of Box and Jenkins (1970). In particular, for lin-

ear Gaussian time series models, both formal testing procedures and graphical tools exist

196

(a)

Handbook of Discrete-Valued Time Series

and are implemented in standard statistical software packages; see, for example, Li (2004).

For time series models for counts, however, this is not the case. In the following subsection,

we discuss standardized, or Pearson, residuals and how they can be fruitfully employed

in model diagnostics. In the subsection to follow, we present an interesting type of resid-

ual available for integer autoregressive models known as component residuals; these offer

an opportunity to perform diagnostics on each of the two right-hand side parts of (9.1)

separately.

9.3.1 Pearson Residuals

Raw residuals are dened as deviations of X

from its conditional expectation given the

past, that is, for t = 1, ..., T as

= X

− E

t−1

] , (9.3)

where E

t−1

is the expectation taken conditional on F

t−1

, which contains the relevant past

history of the process, including possible covariates. Pearson residuals are dened as the

scaled version of the raw residuals

Var

t−1

]

1/2

. (9.4)

For practical implementation, the population quantities in (9.4) have to be replaced by

their estimated counterparts. If a model tted to data is correctly specied, these resid-

uals should exhibit mean zero, variance one, and no signicant serial correlation. For the

integer autoregressive class of model (9.1), these properties are readily shown.

Harvey and Fernandes (1989) suggest a number of model diagnostic checks based on

the Pearson residuals utilizing their sample mean and variance (and proximity to zero and

unity, respectively), together with assessing the presence of any unwanted dynamic struc-

ture in them via computing residual autocorrelations at different time lags and depicting

them in a residual autocorrelation plot.

Figure 9.4 provides ACF plots of Pearson residuals for the two data sets introduced in

Section 9.2 after tting a PINAR(1) model to each. Note that these plots (as do all other

0.15

0.10

0.05

–0.00

–0.05

–0.10

0.25

0.15

0.05

–0.05

–0.15

–0.25

123456789 11 13

3 5 7 9 11 13 15 17 19

Lags

(b)

Lags

FIGURE 9.4

ACF plots of Pearson residuals after tting a PINAR(1) model to the cuts data (a) and the iceberg order

data (b).

  

 

197 Model Validation and Diagnostics

residual ACF plots in the chapter) display dashed lines representing the usual approxi-

mate two standard error bounds for departure of the relevant ordinate from zero. The left

panel refers to the cuts data set and the right panel to the iceberg order data. Both indicate

that this simple model does not adequately capture the dependencies in the data. From

the left panel, the dynamic misspecication previously observed in Figure 9.3 is not so evi-

dent. However, the residual autocorrelation at lag 12 appears quite large and may suggest

a neglected seasonal component. The right panel, associated with the iceberg order data,

tells a similar story to that already gleaned from Figure 9.3.

The sample means of the Pearson residuals for both data sets are close to zero. However,

their variances are 1.607 and 1.289 for the cuts and iceberg order data, respectively. Both

numbers are considerably larger than unity, suggesting potential misspecication of the

Poisson innovation distribution specied in the PINAR(1) tted model.

9.3.2 Component Residuals

At the outset, note that the two parts of the right-hand side of (9.1) and the special case (9.2)

are typically unobserved. The rst part can be thought of as a specication for the (random)

number of survivors from stochastic operations performed at, or prior to, time t,or its

complement, the number of departures. The second part reects the number of new arrivals

to the system at time t. It is transparent to derive the concepts in the following for the

model variant provided in (9.2). For this case, though each of the α

X

t−k

is unobservable,

following Freeland and McCabe (2004), we dene a set (t = 1, ..., T) of departure residuals

for each operator in (9.2) (k ∈{1, ..., p})by

k,t

= E

[α

 X

t−k

] − E

t−1

[α

 X

t−k

] , (9.5)

where E

is the expectation conditional on all information up to and including time t.

Generally, E

[α

 X

t−k

] = E

t−1

[α

 X

t−k

] as the conditioning sets are different. Similarly,

dene the set of arrivals residuals (t = 1, ..., T)as

p +1,t

= E

[ε

] − E

t −1

[ε

] . (9.6)

By considering the sum of the set of p + 1 component residuals thereby dened as

p+1 p p

k,t

= E

[α

 X

t−k

] + E

[ε

] − E

t−1

[α

 X

t−k

] + E

t−1

[ε

]

k=1 k=1 k=1

= E

] − E

t−1

] = X

− E

t−1

] = r

it is seen that the component residuals add up to the usual raw residuals for model (9.2).

One advantage of sets of residuals being associated with each unobserved part of the

model is to offer the potential that they be used to identify the source of problems asso-

ciated with a tted model in the following way. Initially, any of this set of p + 1 residuals

may be used to check specication, either informally through the use of time series plots

(or other graphical devices) and/or more formally through the construction of statistical

specication tests. If some component residual indicates that the corresponding compo-

nent of the model is not well specied, it may be possible to suggest modications for

improvement. For example, a cyclical pattern in a residual plot may indicate the presence









198 Handbook of Discrete-Valued Time Series

of seasonality and this could formally be tested for using residual autocorrelation methods.

Additional lags or covariates could then be added to improve the model. A little care must

be taken, however, in examining several sets of residuals as these sets are correlated with

one another (as are individual residuals within any set).

Just like the Pearson residuals of the previous subsection, component residuals have to

be estimated as their denition requires knowledge of the α

and whatever parameters are

involved in ε

. But, given estimates, residual sets are typically easy to compute by sim-

ply plugging in the estimates for the unknown parameters. As an example, consider the

pth order model (9.2) where the thinning operator is the standard binomial thinning one

and suppose the arrivals are i.i.d. Poisson random variables with mean λ. For this model,

the conditional distribution of X

given the past information F

t−1

= (X

t−1

, ..., X

t−p

)is

based on the binomial distribution and is given by

min[X

t−1

]

 

min[X

t−2

−i

]

 

P(X

t−1

) =



t−1

(1 − α

)

t−1

−i



t−2

(1 − α

)

t−2

−i

=0 i

min[X

t−p

−(i

+··· +i

p−1

)]

 

···



t−p

−i



−λ

−(i

+···+i

)



. (9.7)

(1 − α

)

− (i

+···+ i

) !

The residual sets for t = 1, ..., T are as introduced in (9.5) and (9.6). A little algebra (see

Freeland and McCabe 2004) shows

[α

 X

t−k

] =

t−k

P(X

− 1|X

t−1

, ..., X

t−k

− 1, ..., X

t−p

)

(9.8)

P(X

t−1

)

and

λP(X

− 1|X

t−1

, ..., X

t−p

)

[ε

P(X

t−1

)

, (9.9)

where we use the convention that P i|j

, ..., j

= 0ifany of i or j

, ..., j

is less than zero.

Thus, given any suitable estimates αˆ

and λ

, we can readily compute the estimated residual

sets r

k,t

; t = 1, ..., T using (9.7) in conjunction with (9.8) and (9.9) for k ∈{1, ..., p + 1}.

To demonstrate the use of component residuals, we compute and analyze them for the

cuts data. Figure 9.5 provides ACF plots of the component residuals for the cuts data when

the PINAR(1) model has been tted. It can be compared with the left panel of Figure 9.4.

The Pearson residual is decomposed into two components. Both sets of residuals show

strong evidence of seasonality based on the graphical and correlation evidence, with much

greater variability being seen in the arrivals process. The comparatively large residual ACF

ordinates at lags 2 and 12 in Figure 9.4 are seen to be attributable to both departure and

arrival residuals with regard to the latter ordinate, but only to the arrival component in the

case of the former.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 9: Model Validation and Diagnostics (2/6)

Create new playlist

Sign In

Sign Up

Table of Contents for
9: Model Validation and Diagnostics (2/6)