194
Handbook of Discrete-Valued Time Series
been detected on the basis of a set of standard residual diagnostics. In addition, Pavlopou-
los and Karlis (2008) apply the method by using tail probabilities and log-moments as their
functionals of interest. The usefulness of the parametric bootstrap approach is particularly
highlighted here, as formulae for both characteristics are generally unavailable for count
time series models.
We now introduce two real-life data sets to be employed for illustrative purposes at
various points in the chapter. An initial (PINAR(1)) model is tted to each, and the Tsay
procedure of this section is then used to assess a model’s suitability.
The rst one consists of 120 counts of claimants collecting wage loss benet for injuries in
the workplace at one specic service delivery location of the Workers Compensation Board
of British Columbia, Canada. Only injuries due to cuts and lacerations are considered; we
refer to these data as the cuts data. The data set is monthly and covers the period from
January 1985 to December 1994. It has been analyzed previously by Freeland and McCabe
(2004), for example. The sample mean and variance of the data are 6.133 and 11.797, respec-
tively, and Figure 9.1 provides a time series plot together with a summary of sample serial
correlation and partial correlation properties. Note that the latter should be thought of as a
heuristic for count data, because, to the best of our knowledge, theoretical results for partial
autocorrelations with non-Gaussian data have not yet been established.
Our second exemplar data set consists of counts of iceberg orders in the order book on the
ask side of Deutsche Telekom stock sampled every 15 min on the XETRA system operated
by the Deutsche Börse. Iceberg orders constitute a particular type of order in many limit
order markets. Their nomenclature is derived from the fact that only a small part of the
order (the tip of the iceberg) is visible in the order book, while the reminder of the order
is hidden. Iceberg order data on different stocks from the one considered here have been
studied elsewhere, for example, by Jung and Tremayne (2011b) and McCabe et al. (2011).
The data set analyzed relates to 32 trading days in the rst quarter of 2004. With 8.5 h of
trading per day (9 am to 5:30 pm), there are 34 observations per day giving a count time
series of 1088 observations. The sample mean and variance are, respectively, 1.39 and 2.08.
Figure 9.2 depicts the raw data and its serial correlation structure. These data, which we
refer to as the iceberg order data, evidence a dynamic structure that decays more slowly
than the cuts data.
22
24
0.7
0.5
SACF
20
0.3
18
0.1
14
16
–0.1
–0.3
1
2 3 4 5 6 7 8 9 10 11 12 13 14 15
Count
12
Lags
10
SPACF
8
0.7
4
6
0.5
0.3
0.1
2
–0.1
0
0
10 20 30 40 50 60 70 80 90 100 110 120
–0.3
12 3 4 5 6 7 8 910 11 12 13 14 15
(a) Time (b)
Lags
FIGURE 9.1
Time series plot (a) and SACF/SPACF plots (b) of the cuts data.
195
Model Validation and Diagnostics
12
0.8
SACF
0.6
10
0.4
0.2
8
0.0
–0.2
1 2 34 5 6 78 9 10
12 14 16 18 20
Count
6
Lags
4
0.8
SPACF
0.6
2
0.4
0.2
0.0
10
0
80
170
270
370
470
570
670
770
870
970
1080
–0.2
1 2 34 5 6 78 9 10 12 14 16 18 20
(a) Time (b)
Lags
FIGURE 9.2
Time series plot (a) and SACF/SPACF plots (b) of the iceberg order data.
1.0
1.0
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
–0.0
–0.0
–0.2
–0.2
–0.4
–0.4
01 2 3 456 7 8 910111213141516
123456789 11 13 15 17 19 21
(a) Lags
(b) Lags
FIGURE 9.3
Parametric resampling diagnostics after tting a PINAR(1) model to the cuts data (a) and the iceberg order data
(b). The 95% acceptance bounds are shown as “+” symbols and the SACF ordinates by .”
Figure 9.3 provides evidence of the usefulness of the parametric resampling method
described earlier and displays relevant graphs after a prototypical PINAR(1) model has
been tted to each data set. The left panel refers to the cuts data and the right panel to
the iceberg order data. Both indicate that neither tted model adequately captures the
dynamics in the respective data set, situations that we shall seek to remedy in due course.
9.3 Residual Analysis
Diagnostic checks based on model residuals have a long tradition in time series analysis;
see, inter alia, discussion in the seminal work of Box and Jenkins (1970). In particular, for lin-
ear Gaussian time series models, both formal testing procedures and graphical tools exist
196
(a)
Handbook of Discrete-Valued Time Series
and are implemented in standard statistical software packages; see, for example, Li (2004).
For time series models for counts, however, this is not the case. In the following subsection,
we discuss standardized, or Pearson, residuals and how they can be fruitfully employed
in model diagnostics. In the subsection to follow, we present an interesting type of resid-
ual available for integer autoregressive models known as component residuals; these offer
an opportunity to perform diagnostics on each of the two right-hand side parts of (9.1)
separately.
9.3.1 Pearson Residuals
Raw residuals are dened as deviations of X
t
from its conditional expectation given the
past, that is, for t = 1, ..., T as
r
t
= X
t
E
t1
[X
t
] , (9.3)
where E
t1
is the expectation taken conditional on F
t1
, which contains the relevant past
history of the process, including possible covariates. Pearson residuals are dened as the
scaled version of the raw residuals
e
t
=
Var
t1
r
t
[X
t
]
1/2
. (9.4)
For practical implementation, the population quantities in (9.4) have to be replaced by
their estimated counterparts. If a model tted to data is correctly specied, these resid-
uals should exhibit mean zero, variance one, and no signicant serial correlation. For the
integer autoregressive class of model (9.1), these properties are readily shown.
Harvey and Fernandes (1989) suggest a number of model diagnostic checks based on
the Pearson residuals utilizing their sample mean and variance (and proximity to zero and
unity, respectively), together with assessing the presence of any unwanted dynamic struc-
ture in them via computing residual autocorrelations at different time lags and depicting
them in a residual autocorrelation plot.
Figure 9.4 provides ACF plots of Pearson residuals for the two data sets introduced in
Section 9.2 after tting a PINAR(1) model to each. Note that these plots (as do all other
0.15
0.10
0.05
–0.00
–0.05
–0.10
0.25
0.15
0.05
–0.05
–0.15
–0.25
123456789 11 13
15
1
3 5 7 9 11 13 15 17 19
Lags
(b)
Lags
FIGURE 9.4
ACF plots of Pearson residuals after tting a PINAR(1) model to the cuts data (a) and the iceberg order
data (b).
197 Model Validation and Diagnostics
residual ACF plots in the chapter) display dashed lines representing the usual approxi-
mate two standard error bounds for departure of the relevant ordinate from zero. The left
panel refers to the cuts data set and the right panel to the iceberg order data. Both indicate
that this simple model does not adequately capture the dependencies in the data. From
the left panel, the dynamic misspecication previously observed in Figure 9.3 is not so evi-
dent. However, the residual autocorrelation at lag 12 appears quite large and may suggest
a neglected seasonal component. The right panel, associated with the iceberg order data,
tells a similar story to that already gleaned from Figure 9.3.
The sample means of the Pearson residuals for both data sets are close to zero. However,
their variances are 1.607 and 1.289 for the cuts and iceberg order data, respectively. Both
numbers are considerably larger than unity, suggesting potential misspecication of the
Poisson innovation distribution specied in the PINAR(1) tted model.
9.3.2 Component Residuals
At the outset, note that the two parts of the right-hand side of (9.1) and the special case (9.2)
are typically unobserved. The rst part can be thought of as a specication for the (random)
number of survivors from stochastic operations performed at, or prior to, time t,or its
complement, the number of departures. The second part reects the number of new arrivals
to the system at time t. It is transparent to derive the concepts in the following for the
model variant provided in (9.2). For this case, though each of the α
k
X
tk
is unobservable,
following Freeland and McCabe (2004), we dene a set (t = 1, ..., T) of departure residuals
for each operator in (9.2) (k ∈{1, ..., p})by
r
k,t
= E
t
[α
k
X
tk
] E
t1
[α
k
X
tk
] , (9.5)
where E
t
is the expectation conditional on all information up to and including time t.
Generally, E
t
[α
k
X
tk
] = E
t1
[α
k
X
tk
] as the conditioning sets are different. Similarly,
dene the set of arrivals residuals (t = 1, ..., T)as
r
p +1,t
= E
t
[ε
t
] E
t 1
[ε
t
] . (9.6)
By considering the sum of the set of p + 1 component residuals thereby dened as
p+1 p p
r
k,t
= E
t
[α
k
X
tk
] + E
t
[ε
t
] E
t1
[α
k
X
tk
] + E
t1
[ε
t
]
k=1 k=1 k=1
= E
t
[X
t
] E
t1
[X
t
] = X
t
E
t1
[X
t
] = r
t
,
it is seen that the component residuals add up to the usual raw residuals for model (9.2).
One advantage of sets of residuals being associated with each unobserved part of the
model is to offer the potential that they be used to identify the source of problems asso-
ciated with a tted model in the following way. Initially, any of this set of p + 1 residuals
may be used to check specication, either informally through the use of time series plots
(or other graphical devices) and/or more formally through the construction of statistical
specication tests. If some component residual indicates that the corresponding compo-
nent of the model is not well specied, it may be possible to suggest modications for
improvement. For example, a cyclical pattern in a residual plot may indicate the presence
198 Handbook of Discrete-Valued Time Series
of seasonality and this could formally be tested for using residual autocorrelation methods.
Additional lags or covariates could then be added to improve the model. A little care must
be taken, however, in examining several sets of residuals as these sets are correlated with
one another (as are individual residuals within any set).
Just like the Pearson residuals of the previous subsection, component residuals have to
be estimated as their denition requires knowledge of the α
k
and whatever parameters are
involved in ε
t
. But, given estimates, residual sets are typically easy to compute by sim-
ply plugging in the estimates for the unknown parameters. As an example, consider the
pth order model (9.2) where the thinning operator is the standard binomial thinning one
and suppose the arrivals are i.i.d. Poisson random variables with mean λ. For this model,
the conditional distribution of X
t
given the past information F
t1
= (X
t1
, ..., X
tp
)is
based on the binomial distribution and is given by
min[X
t1
,X
t
]
min[X
t2
,X
t
i
1
]
P(X
t
|F
t1
) =
X
t1
α
1
i
1
(1 α
1
)
X
t1
i
1
X
t2
α
2
i
2
(1 α
2
)
X
t2
i
2
i
1
i
2
i
1
=0 i
2
=0
min[X
tp
,X
t
(i
1
+··· +i
p1
)]
···
X
tp
α
i
p
X
tp
i
p
e
λ
λ
X
t
(i
1
+···+i
p
)
. (9.7)
i
p
p
(1 α
p
)
X
t
(i
1
+···+ i
p
) !
i
p
=0
The residual sets for t = 1, ..., T are as introduced in (9.5) and (9.6). A little algebra (see
Freeland and McCabe 2004) shows
E
t
[α
k
X
tk
] =
α
k
X
tk
P(X
t
1|X
t1
, ..., X
tk
1, ..., X
tp
)
(9.8)
P(X
t
|F
t1
)
and
λP(X
t
1|X
t1
, ..., X
tp
)
E
t
[ε
t
]=
P(X
t
|F
t1
)
, (9.9)
where we use the convention that P i|j
1
, ..., j
p
= 0ifany of i or j
1
, ..., j
p
is less than zero.
Thus, given any suitable estimates αˆ
k
and λ
ˆ
, we can readily compute the estimated residual
sets r
ˆ
k,t
; t = 1, ..., T using (9.7) in conjunction with (9.8) and (9.9) for k ∈{1, ..., p + 1}.
To demonstrate the use of component residuals, we compute and analyze them for the
cuts data. Figure 9.5 provides ACF plots of the component residuals for the cuts data when
the PINAR(1) model has been tted. It can be compared with the left panel of Figure 9.4.
The Pearson residual is decomposed into two components. Both sets of residuals show
strong evidence of seasonality based on the graphical and correlation evidence, with much
greater variability being seen in the arrivals process. The comparatively large residual ACF
ordinates at lags 2 and 12 in Figure 9.4 are seen to be attributable to both departure and
arrival residuals with regard to the latter ordinate, but only to the arrival component in the
case of the former.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset