4
Count Time Series with Observation-Driven
Autoregressive Parameter Dynamics
Dag Tjøstheim
CONTENTS
4.1 Introduction.....................................................................................77
4.2 Probabilistic Structure. ........................................................................81
4.2.1 Random Iteration Approach.........................................................81
4.2.2 The General Markov Chain Approach... .. .. .. .. .. .. .. .. ........... .. .. .. .. .. .. .. ..83
4.2.3 Coupling Arguments..................................................................87
4.2.4 Ergodicity...............................................................................87
4.2.5 Weak Dependence.. ... .. .. . .. .. . .. .. ... .. ... .. ........................................88
4.2.6 Markov Theory with φ-Irreduciblity.. ... .. ... .....................................90
4.2.7 Perturbation Method..................................................................90
4.3 Statistical Inference.............................................................................92
4.3.1 Asymptotic Estimation Theorywithout Perturbation............................92
4.3.2 The Perturbation Approach.. ... .. ... .. ... .. ... .. ... .. ... .. ... .. ... .. ... .. ... .. ... .. .94
4.4 Extensions.......................................................................................97
4.4.1 Higher-Order Models.................................................................97
4.4.2 VectorModels..........................................................................97
4.4.3 Generalizing the Poisson Assumption. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .98
4.4.4 Specicationand Goodness-of-FitTesting.........................................98
References.............................................................................................98
4.1 Introduction
A count time series {Y
t
} is a time series that takes its values on a subset N
0
of the nonneg-
ative integers N . Most often this subset will be all of N , but it can also be the case that,
for example, e.g. N
0
={0, 1} or N
0
={0, 1, 2, ..., k}, for a binary and a binomial time series,
respectively.
In this chapter, we will look at count time series with dynamics driven by an autore-
gressive mechanism. This means that the distribution of {Y
t
} is modeled by a parametric
distribution, for example, a Poisson distribution, whose parameters are assumed to be
stochastic processes. The dynamics of the {Y
t
} process is created through a recursive
autoregressive scheme for the parameter process. More precisely, in the rst order case,
P(Y
t
= n|X
t
) = p(n, X
t
), (4.1)
77
78 Handbook of Discrete-Valued Time Series
where
nN
0
p(n, X
t
) = 1, X
t
is a parameter-driven process given by a possibly vector and
possibly nonlinear AR(p)-type process
X
t
= f(X
t1
, ..., X
tp
, ε
t1
)
and {ε
t
}is a series of innovations or random shocks driving the process {X
t
}. The parameter
process is a genuine nonlinear autoregressive process if {ε
t
} consists of iid (indepen-
dent identically distributed) random variables such that ε
t1
is independent of F
t
X
1
,the
σ-algebra generated by {X
s
, s t 1}. In the terminology of Cox (1981), the process {Y
t
} is
then a parameter-driven process; see, for example, Davis and Dunsmuir (2015; Chapter 6
in this volume). However, this is not the case for the processes we will be mainly concerned
with. We will rather look at the class of processes obtained by replacing {ε
t
} by {Y
t
} so that
X
t
= f(X
t1
, ..., X
tp
, Y
t1
) (4.2)
and the more general
X
t
= f(X
t1
, ..., X
tp
, Y
t1
, ..., Y
tq
) (4.3)
with appropriate initial conditions. Clearly these are not genuine AR or ARMA processes
because of the presence of lagged values of {Y
t
}, which themselves depend on lagged val-
ues of {X
t
}. In the terminology of Cox (1981), the resulting {Y
t
} processes are examples of
observation-driven processes. We will concentrate our analysis on (4.1) and (4.2) because
it yields a Markov structure more or less directly, whereas (4.1) and (4.3) need a redeni-
tion of the state space to obtain a Markov structure. Such models have been widely used in
applications recently. For specic applications and many references the reader is referred
to Fokianos (2015; Chapter 1 in this volume). In the current chapter, the emphasis will be
on theory.
The main mathematical tool that has been used to handle the theory of these models
is Markov chain theory. To see why, it is advantageous to rewrite the model slightly and
at the same time make it more precise. To this end let {N
t
} be a sequence of nonnegative
integer-valued random variables that are independent given {X
t
} and have the probability
distribution function p(·, X
t
).If {X
t
} is nonrandom and equal to a constant, then {N
t
} is an
iid sequence. In the general case we can write
Y
t
= N
t
(X
t
).
This means that as we move from t1tot, then rst we obtain a value of X
t
from (4.2) again
with appropriate initial conditions. Then, given X
t
, there is a separate and independent
random mechanism where N
t
, and as a result Y
t
, is drawn from p(·, X
t
). This makes {X
t
}
into a pth order Markov chain with respect to the σ-eld {F
X
}={σ(X
s
, s t)} and {(X
t
, Y
t
)}
t
is a Markov chain on F
X,Y
={σ(X
s
, s t; Y
u
, u t)}.
t
The perhaps simplest example of such a process is a rst-order Poisson autoregression
with X
t
= λ
t
, where λ
t
is a scalar Poisson intensity parameter, and where
Y
t
= N
t
(λ
t
). (4.4)
79 Count Time Series with Observation-Driven Autoregressive Parameter Dynamics
Here {N
t
(·)} could be looked at as a sequence of independent Poisson processes of unit
intensity, and where
λ
t
= d + aλ
t1
+ bY
t1
(4.5)
with a, b, d being nonnegative unknown scalars. This kind of model was treated in Fokianos
et al. (2009) and other papers referred to in that paper. To start the recursion an initial value
of λ
0
and Y
0
is needed. The process dened by (4.4) and (4.5) is often compared to a GARCH
process, see, for example Francq and Zakoïan (2011),
Y
t
= h
1
t
/2
ε
t
(4.6)
with h
t
being the conditional variance of Y
t
given its past, and where h
t
is given by a
recursive equation
h
t
= d + ah
t1
+ bY
2
t1
. (4.7)
Here of course h
t
corresponds to λ
t
, and the series of iid random variables {ε
t
} with mean
zero and variance 1 corresponds to the series of Poisson processes {N
t
(·)} of unit intensity
in (4.4). There are two problems which make the analysis of (4.4), (4.5) more difcult than
the analysis of the GARCH system (4.6), (4.7): (1) {λ
t
} is driven by integer-valued inno-
vations {Y
t
}, whereas {λ
t
} itself, as an intensity parameter, is continuous valued. In the
GARCH situation both X
t
and h
t
are usually taken to be continuous valued, although there
are exceptions (Francq and Zakoïan 2004), and (2) the quite innocent looking relationship
(4.4) in fact represents a complex nonlinear structure compared to the multiplicative struc-
ture of (4.6). These two problems will be discussed throughout the paper, as they are at the
core of more or less everything that concerns these processes.
The analogy with the GARCH structure has led to the acronym INGARCH (integer
generalized autoregressive conditional heteroscedastic) for these processes, but this is an
acronym that I nd to be unfortunate, since (4.4) is not in general a variance property. It
may be difcult to change the terminology at the present point in time. I would rather pre-
fer INGAR (integer generalized autoregressive processes). In an early version of the model
it was called Bin(1) by Rydberg and Shephard (2001).
The Poisson distribution is of course just one example of a distribution p(·, X
t
). Other
examples are a binary probability where the probability of success {p
t
} would serve as a
parameter process, Wang and Li (2011), or it could be a binomial distribution or a negative
binomial. The case of the negative binomial has been treated by Christou and Fokianos
(2014), Davis and Wu (2009), and Davis and Liu (2014), and we will return to processes
governed by this distribution later.
When it comes to a specication of (4.1 through 4.3), there are two main categories of
choices. The rst and most obvious one has to do with the choice of the function f in (4.2)
and (4.3). One special case is the rst-order linear model (4.5). A nonlinear additive model
with higher-order lags is the specication
p q
f (X
t1
, ..., X
tp
, Y
t1
, ..., Y
tq
) = g
i
(X
ti
) + h
i
(Y
ti
) (4.8)
i=1 i=1
80 Handbook of Discrete-Valued Time Series
for some (usually nonnegative) functions {g
i
} and {h
i
}. So far, I do not know about any
systematic attempts to analyse models such as (4.8) except in the case where g
i
’s and h
i
’s
are linear functions and some special nonlinear models treated by Davis and Liu (2014).
Clearly there are many other possibilities, and in the course of this paper we will put
various restrictions on f .
The other category of specication has to do with the choice of distribution p in (4.1)
and the choice of parametrization of this distribution. The parametrization is not unique
and could have to do with the characterization of the exponential family of distributions,
where there are canonical choices of parameters. For example, instead of the intensity λ
t
as
a choice of parameter, one could choose the canonical parameter ν
t
= ln λ
t
in the Poisson
case. For a binary parameter process with probability parameter p
t
, one could choose the
parameter α
t
= ln p
t
/1 p
t
. One obvious advantage of using ν
t
instead of λ
t
in the Poisson
linear case is that in an equation
ν
t
= d + aν
t1
+ b ln(Y
t1
+ 1)
corresponding to (4.5), it is not required any more that the parameters d, a, b be nonnegative,
since ν
t
itself can take negative values, and in a sense it is easier to implement explanatory
variables. We have treated such processes in Fokianos and Tjøstheim (2011).
We will be concerned with both the probabilistic structure of the system (4.1), (4.2) and
the asymptotic inference theory of parameter estimates of the parameters characterizing
the autoregressive process {X
t
}. Examples of parameters that have to be estimated are the
parameters a, b, d in the linear case (4.5).
Somewhat different techniques have been used to characterize the probabilistic struc-
ture, but common for most of them is the Markov chain theory. This is of course an integral
part of (nonlinear) AR processes (see Tjøstheim 1990), but it is made more difcult and
nonstandard in the present case due to the incompatibility problems of values of {X
t
} or
{λ
t
} as compared to {Y
t
}.
One technique for obtaining asymptotic results for the parameter estimates uses stan-
dard Markov chain theory by perturbing the original recursive relationship (4.2) or (4.5)
with a continuously distributed perturbation ε
t
, so that in the linear case one obtains the
perturbed equation
λ
t
= d + aλ
t1
+ bY
t1
+ ε
t
(4.9)
and then letting this perturbation tend to zero. This is perhaps not as direct approach as
one could wish for (cf. Doukhan 2012 for a critical view), but it leads relatively efciently to
results (Fokianos et al. 2009). A disadvantage of this approach is that one is not concerned
so much with the probabilistic structure of the processes ({X
t
}, {Y
t
}) themselves but rather
of the perturbed versions. To look at the probabilistic structure of (4.1), (4.2) and more
specically (4.4), (4.5), again there are different approaches. We will highlight all of this as
we proceed.
As mentioned, we will cover both the probabilistic structure and the theory of inference.
But the emphasis will be on the former because there are recent review papers, Fokianos
(2012), Tjøstheim (2012), and Fokianos (2015; Chapter 1 in this volume), with focus on
statistical inference and applications. We will start by a discussion of the existence of a sta-
tionary, that is invariant, probability measure for ({X
t
}, {Y
t
}). This problem is fundamental
for most of what follows, such as ergodicity, irreducibility, and recurrence and henceforth
for statistical inference. These topics are treated in Section 4.2. We draw the connection to
81 Count Time Series with Observation-Driven Autoregressive Parameter Dynamics
consistency and asymptotic theory of parameter estimates in Section 4.3 and mention some
extensions in Section 4.4.
4.2 Probabilistic Structure
4.2.1 Random Iteration Approach
The recursive system (4.1), (4.2) can be looked at as a random iteration scheme. General
random iteration schemes have been studied among others by Diaconis and Freedman
(1999). They look at an iterative scheme which in our notation can most conveniently be
written as
X
0
= x, X
1
= f
N
0
(x),
or generally as
X
t+1
= f
N
t
(X
t
). (4.10)
In the context of our system (4.1), (4.2), N
0
, N
1
, ... can be thought of as iterative and inde-
pendent drawings from the distribution p, or in the linear Poisson case as independent
drawings from the Poisson processes of unit intensity; that is, from the Poisson distribu-
tion with intensity parameter 1. The {X
t
} process of (4.10) can be directly identied with
the {X
t
} process of (4.2) or λ
t
of (4.5). The {Y
t
} process of (4.1) and of (4.4) is implicitly a part
of (4.10) through the drawings from the distribution function p.
In the setup of Diaconis and Freedman (1999) {X
t
} has as its state space S, a complete
metric space with a metric ρ. In the bulk of their paper, and in particular in their Theo-
rems 1.1 and 5.1, p is not allowed to depend on x S, but they state (p. 49) that “Theorem
1.1 can be extended to cover p that depends on x, but further conditions are needed.”
In our setup most of the time {X
t
} (or {λ
t
}) would have R
1
as its state space. Note that
+
in order to use the results of Diaconis and Freedman the random mechanism N
t
should
not depend on X
t
. For the Poisson setup in (4.4), (4.5) this is obtained by letting N
t
be the
realizations of Poisson processes of unit intensity. In, for example, Davis and Liu (2014), it is
obtained by setting X
t
= E(Y
t
|F
t1
) with F
t1
= σ(X
0
, Y
0
, ..., Y
t1
), and by considering the
inverse of the cumulative distribution function of an exponential family distribution of Y.
Theorems 1.1 and 5.1 of Diaconis and Freedman (1999) both give sufcient conditions
for the existence of a unique stationary measure for the Markov chain {X
t
}.Thisinturnis
an essential condition for establishing limit results and a theory of inference for parameter
estimates. The conditions of their Theorem 1.1 are somewhat more restrictive than those of
Theorem 5.1, but they are easier to formulate and understand intuitively:
First, the functions f
N
(·) are supposed to be Lipschitz such that
ρ(f
N
(x), f
N
(y)) C
N
ρ(x, y) (4.11)
for some C
N
and all x and y in S. In fact, f is assumed to be contracting in average, since it is
assumed that ln C
N
p(dN)<0 (which is a sum in our case since p is discrete). This makes
C
N
< 1 for a typical N. The statement in (4.11) is the statement of the stationarity condition
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset