10
Detection of Change Points in Discrete-Valued
Time Series
Claudia Kirch and Joseph Tadjuidje Kamgaing
CONTENTS
10.1 Introduction...................................................................................219
10.2 General Principles of Retrospective Change Point Analysis.. . . .. . . ... . . .. . . ... . . .. . . .220
10.3 Detection of Changes in Binary Models...................................................225
10.4 Detection of Changes in Poisson Autoregressive Models..............................227
10.5 Simulation and Data Analysis..............................................................230
10.5.1 Binary Autoregressive Time Series.................................................230
10.5.1.1 Data Analysis:U.S. Recession Data......................................231
10.5.2 Poisson Autoregressive Models....................................................232
10.5.2.1 Data Analysis: Number of Transactions per Minute
for Ericsson B Stock........................................................233
10.6 Online Procedures............................................................................235
10.7 Technical Appendix. .. ... .. ... .. .............................................................236
10.7.1 Regularity Assumptions.............................................................236
10.7.2 Proofs...................................................................................239
References............................................................................................243
10.1 Introduction
There has recently been a renewed interest in statistical procedures concerned with the
detection of structural breaks in time series, for example, the recent review articles by
Aue and Horváth [2] and Horváth and Rice [16]. The literature contains statistics to detect
simple mean changes, changes in linear regression, changes in generalized autoregressive
conditionally heteroscedastic (GARCH) models; from likelihood ratio to robust M meth-
ods (see, e.g., Berkes et al. [3], Davis et al. [6], Hušková and Marušiaková [26], and Robbins
et al. [31]). While at rst sight, the corresponding statistics appear very different, most of
them are derived using the same principles. In this chapter, we shed light on those prin-
ciples, explaining how corresponding statistics and their respective asymptotic behavior
under both the null and alternative hypotheses can be derived. This enables us to give a
unied presentation of change point procedures for integer-valued time series. Because the
methodology considered in this chapter is by no means limited to these situations, it allows
for future extensions in a standardized way.
219
220 Handbook of Discrete-Valued Time Series
Hudecová [17] and Fokianos et al. [11] propose change point statistics for binary time
series models while Franke et al. [13] and Doukhan and Kegne [7] consider changes
in Poisson autoregressive models. Related procedures have also been investigated by
Fokianos and Fried [9,10] for integer valued GARCH and log-linear Poisson autoregres-
sive time series, respectively, but with a focus on outlier detection and intervention effects
rather than change points.
Section 10.2 explains how change point statistics can be constructed and derives asymp-
totic properties under both the null and alternative hypotheses, based on regularity
conditions, which are summarized in Appendix 10.7.1 to lighten the presentation. This
methodology is then applied to binary time series in Section 10.3 and to Poisson autoregres-
sive models in Section 10.4, generalizing the statistics already discussed in the literature.
In Section 10.5, some simulations as well as applications to real data illustrate the perfor-
mance of these procedures. A short review of sequential (also called online) procedures for
count time series is given in Section 10.6. Finally, the proofs are given in Appendix 10.7.2.
10.2 General Principles of Retrospective Change Point Analysis
Assume that data Y
1
, ..., Y
n
are observed with a possible structural break at the (unknown)
change point k
0
. We will rst look at likelihood ratio tests for structural breaks before
explaining how to generalize these ideas. To this end, we assume that the data before and
after the change can be parameterized by the same likelihood function L but with different
(unknown) parameters θ
0
, θ
1
R
d
. A likelihood ratio approach yields the following
statistic:
 
max
(k) := max
(
Y
1
, ..., Y
k
)
,
θ
k
+ Y
k+1
, ..., Y
n
,
θ
k
(
Y
1
, ..., Y
n
)
,
θ
n
,
1kn 1kn
where (Y, θ) is the log-likelihood function and θ
k
and θ
k
are the maximum likelihood
estimator based on Y
1
, ..., Y
k
and Y
k+1
, ..., Y
n
, respectively. The maximum over k is due to
the fact that the change point is unknown, so the likelihood ratio statistic maximizes over
all possible change points. A similar approach based on some asymptotic Bayes statistic
leads to a sum-type statistic, where the sum over (k) is considered (see, e.g., Kirch [19]).
Davis et al. [6] proposed this statistic for linear autoregressive processes of order p with
standard normal errors:
p
i.i.d.
Y
t
= β
0
+ β
j
Y
tj
+ ε
t
, ε
t
N(0, 1). (10.1)
j=1
In this situation (which includes mean changes as a special case (p = 0)), this maximum
likelihood statistic does not converge in distribution to a nondegenerate limit but almost
surely to innity (Davis et al. [6]). Nevertheless, asymptotic level α tests based on this
maximum likelihood statistic can be constructed using a Darling–Erdös limit theorem as
stated in Theorem 10.1b. In small samples, however, the slow convergence of Darling–
Erdös limit theorems often leads to some size distortions.
221 Detection of Change Points in Discrete-Valued Time Series
Similarly, one can construct Wald-type statistics based on maxima or sums of quadratic
forms of
W(k) :=
θ
k
θ
k
, k = 1, ..., n.
Wald statistics can be generalized to any other estimation procedure for θ and are not
restricted to maximum likelihood estimators. However, for both maximum likelihood and
Wald statistics, the estimators θ
k
and θ
k
need to be calculated, which can be problem-
atic in nonlinear situations. In such situations, which are typical for integer-valued time
series, these estimators are usually not analytically tractable, but need to be calculated
using numerical optimization methods. This can lead to additional computational effort
to calculate the statistics or large numerical errors. The latter problems can be reduced by
using score-type statistics based on maxima or sums of quadratic forms of
S(k) := S k, θ
n
=
(
(Y
1
, ..., Y
k
), θ
)
|
, k = 1, ..., n.
θ
θ=θ
n
In this case, only the estimator based on the full data set Y
1
, ..., Y
n
needs to be calculated
(possibly using numerical methods). The likelihood score statistic for the linear regression
model has been investigated in detail by Hušková et al. [27]. Similarly to Wald statis-
tics, score statistics do not need to be likelihood based but can be generalized to different
estimators as long as those estimators can be obtained as a solution to
n
S n, θ
n
=
F
(
Y
t
, Y
t1
)
, θ
n
= 0
j=1
for some estimating function F. Important estimators of this type are M estimators, which
have been used in the context of linear regression models to construct score-type change
point tests by Antoch and Hušková [1].
In the linear autoregressive situation in (10.1), the likelihood-based statistics of the type
mentioned earlier are all equivalent. Specically, some calculations yield
2(k) = W(k)
T
C
k
C
n
1
C
k
W(k) = S(k)
T
C
k
1
C
n
C
k
1
S(k),
k n
where C
k
= Y
t1
Y
t
T
1
, C
k
= Y
t1
Y
t
T
1
, Y
t1
= (1, Y
t1
, ..., Y
tp
)
T
.
t =1
t =k+1
As already mentioned, the maximum likelihood statistic (hence the corresponding likeli-
hood Wald and score statistics) does not converge in distribution. Under the null hypoth-
esis, the matrix in the quadratic form of the likelihood score statistic can be approximated
asymptotically (as k →∞, n k →∞, n →∞)by
C
k
1
C
n
(C
k
)
1
=
k
1
1
k
n
1
C
1
+ o
P
(1), C = EY
t1
Y
T
t1
. (10.2)
n n
222 Handbook of Discrete-Valued Time Series
n)C
1
does converge in distribution. More precisely, we consider
Replacing this term by w(k/
n
for a suitable weight function w(·) leads to a statistic that
max
w
k
S(k)
T
C
1
S(k),
1kn
n
n
where w : [0, 1]→R
+
is a nonnegative continuous weight function fullling
lim
t
α
w(t)<, lim(1 t)
α
w(t)<, for some 0 α < 1
t0 t1
1
sup
w(t)< for all 0 < η
. (10.3)
ηt1η
2
Theorem 10.1a shows that this class of statistics converges, under regularity conditions,
in distribution to a nondegenerate limit. The following choice of weight function, closely
related to the choice of the weights in (10.2), has often been proposed in the literature:
w(t) = (t(1 t))
γ
,0 γ < 1,
where γ close to 1 detects early or late changes with better power. In the econometrics
literature, the following weight functions are often used, which correspond to a truncation
of the likelihood ratio statistic and can be viewed as the likelihood ratio statistic under
restrictions on the set of admissible change points,
w(t) = (t(1 t))
1/2
1
{t1}
for some > 0. Similarly, if a priori knowledge of the location of the change point is
available, one can increase the power of the designed test statistic for such alternatives by
choosing a weight function that is larger near these points (Kirch et al. [20]). Nevertheless,
these statistics have asymptotic power one for other change locations (See Theorem 10.2).
Additionally, many change point statistics discussed in the literature do not use the full
score function but rather a lower-dimensional projection, where C
n
is replaced by a lower
rank matrix. For linear autoregressive models as in (10.1), for example, Kulperger [24] and
Horváth [25] use a partial sum process based on estimated residuals, which corresponds
to the rst component of the likelihood score vector in this example.
For this reason, in the following, we do not require S(k, θ) to be the likelihood score (nor
even of the same dimension as θ), nor do we assume that θ
n
is the maximum likelihood
estimator. In fact, we allow for general score-type statistics that are based on partial sum
processes of the type
k
S k, θ
n
=
H X
j
, θ
n
,with S n, θ
n
= 0and θ
n
θ
0
, (10.4)
j=1
where θ
0
is typically the correct parameter, X
j
are observations, where, for example, for the
autoregressive case of order one, a vector X
j
= (X
j
, X
j1
)
T
is used, and H is some function
usually of the type AF for some (possibly lower rank) matrix A and an estimating function
223 Detection of Change Points in Discrete-Valued Time Series
F that denes the estimator θ
n
as the unique zero of
j
n
=1
F(X
j
, θ
n
) = 0. Furthermore, it is
possible to allow for misspecication, in which case, θ
0
becomes the best approximating
parameter in the sense of EF(X
j
, θ
0
) = 0. More details on this framework in a sequential
context can be found in Kirch and Kamgaing [22].
We are now able to derive the limit distribution of the corresponding score-type change
point tests under the null hypothesis under the regularity conditions given in Section 10.7.1.
These regularity conditions are implicitly shown in the proofs for change point tests of the
types mentioned earlier. Examples for integer-valued time series are given in Sections 10.3
and 10.4.
Theorem 10.1 We obtain the following null asymptotics:
(a) Let A.1 and A.2 (i) in Section 10.7.1 hold. Assume that the weight function is either
a continuous nonnegative and bounded function w : [0, 1]→ R
+
, or for unbounded
functions fullling (10.3), let additionally A.2 (ii) in Section 10.7.1 hold. Then:
D
1
B
2
(i) max
1kn
w(k
n
/n)
S
k,
θ
n
T
1
S
k,
θ
n
−→ sup
0t1
w(t)
j
d
=
j
(t),
(ii)
1kn
w(
n
k
2
/n)
S
k,
θ
n
T
1
S
k,
θ
n
−→
1
w(t)
d
j=1
B
2
D
j
(t) dt,
0
where B
j
(·),j = 1, ..., d, are independent Brownian bridges and can be replaced by
n
if
n
= o
P
(1).
(b) Under A.1 and A.3 in Section 10.7.1 it holds
P a(log n) max
n
S(k,
θ
n
)
T
1
S(k,
θ
n
) b
d
(log n) t exp(2e
t
),
1kn
k(n k)
where a(x) =
2 log x, b
d
(x) = 2 log x +
d
2
log log x log (d/2), (·) is the Gamma-
function, and d is the dimension of the vector S(k, θ). Furthermore, can be replaced by
an estimator
n
if
n
1/2
1/2
=o
P
((log log n)
1
).
The assumption of continuity of the weight function in (b) can be relaxed to allow for
a nite number of points of discontinuity, where w is either left or right continuous with
existing limits from the other side.
Similarly, under alternatives, we provide some regularity conditions, which ensure
that the tests mentioned earlier have asymptotic power one. Additionally, we propose a
consistent estimator of the change point in rescaled time.
Theorem 10.2 Under alternatives with a change point of the form
k
0
=λn,0 < λ < 1, (10.5)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset