87 Count Time Series with Observation-Driven Autoregressive Parameter Dynamics
4.2.3 Coupling Arguments
An alternative but overlapping approach to proving existence of a stationary measure pro-
ceeds via coupling arguments. This is the approach of Neumann (2011). See also Franke
(2010). Neumann works with the quite general nonlinear recursive system,
Y
t
|F
t
Y
,λ
1
Poisson(λ
t
), λ
t
= f(λ
t1
, Y
t1
), (4.16)
which clearly is also related to the iterative system (4.10). The function f is supposed to
satisfy the contractive condition,
|f (λ, y) f (λ
, y
)|≤α
1
|λ λ
|+α
2
|y y
|, (4.17)
so that α = α
1
+α
2
< 1. This is essentially the sort of condition mentioned for the nonlinear
additive system (4.15), and the same method as was used to prove the weak Feller property
for the linear system can be used here, whereas Chebyshev’s inequality can be used to
prove boundedness in probability (tightness). Theorem 12.1.2 (ii) of Meyn and Tweedie
(2009) then implies the existence of a stationary measure. (Note that this is different from
Theorem 4.1.)
If we let Q
t
x
be the conditional distribution of (Y
t
, λ
t
) given λ
0
= x, then {Q
t
x
, t N }
is tight. Hence, there is a subsequence {t
k
, k N } such that Q
t
x
k
converges weakly to some
probability measure π
x
as k →∞. Coupling arguments are now used to show that this limit
does not depend on the starting value x, and that the full sequence {Q
t
x
, t N } converges.
This implies the existence of a unique stationary distribution.
A brief indication of the coupling argument is as follows: In addition to Y
t
, we construct
another process Y
t
with another starting value y, but other than that its probability gener-
ating mechanism is identical to that of Y
t
. Using general properties of the Poisson process
and the contractivity condition, one obtains
E(|λ
t
λ
t
||x, y) α
t
|x y|
and
E(|Y
t
Y
t
||x, y) α
t
|x y|
and tightness arguments again can be used to complete the proof. Douc et al. (2013)
use coupling techniques in addition to the asymptotic strong Feller criterion of Woodard
et al. (2011).
4.2.4 Ergodicity
Existence of a stationary measure is not enough to develop an appropriate law of large num-
bers and asymptotic distribution theory of parameter estimates. Ergodicity is needed for
consistency, and often ergodicity with an additional rate of convergence to the stationary
distribution, such as geometric ergodicity, is required. This in turn can be linked to mixing
and mixing rates and ultimately to central limit results. Since it turns out that strong mixing
cannot in general be established for the models we look at, other mixing concepts such as
weak dependence, to be dened in Section 4.2.5, may be useful. Alternatively, perturbation
arguments can be used to obtain strong mixing.
88 Handbook of Discrete-Valued Time Series
In the following, we will just state a few facts about ergodicity and mixing in our context.
Ergodicity is of course treated at length in the book by Meyn and Tweedie for many types of
Markov chains, but most of this material refers to the measure theoretic framework where
irreducibility is available, and not so much to our more general situation, where the dis-
creteness of the observations {Y
t
} used as innovations creates difculties. Here, we will look
at the models given by (4.1), (4.2), and (4.4), (4.5) and then in Section 4.2.7 the perturbed
class of models, where ergodicity is much easier to handle.
It should be noticed that “ergodicity” is used with somewhat different meanings in
dynamic system theory and in Markov chain theory. In dynamic system theory a strictly
stationary process is ergodic if an invariant set for the shift operator either has probabil-
ity 0 or 1. This is the sense in which ergodicity is used in Douc et al. (2013), Theorem 3.2.
Also, this is how it is mostly used in time series, possibly non-Markovian, and where it is
employed to prove almost sure convergence of parameter estimates. The basis is of course
Birkhoff’s famous ergodic theorem, Birkhoff (1931), providing a law of large numbers for
ergodic strictly stationary processes.
In Markov chain theory, however, ergodicity is used to characterize the convergence of
the t-step probability transition measure P
t
(x, dy) to π(dy), assuming that there exists an
invariant measure π. Ergodicity for a Markov chain can then be dened as
lim
|P
t
(x, y) π(y)|= 0
t→∞
pointwise in the countable state space case, and
lim
||P
t
(x, ·) π(·)|| = 2 lim sup |P
t
(x, A) π(A)|= 0,
t→∞ t→∞
AB(S)
where ||·|| is the total variation norm in the general state space case. One can prove ergod-
icity for (Y
t
, λ
t
) for the quite general system (4.16). The line of argument is as follows,
Neumann (2011): It is assumed that (4.16), (4.17) hold and such that a stationary invariant
measure exists and such that {Y
t
, λ
t
} is in its stationary regime. The count process {Y
t
} is
absolutely regular. It is well known that this implies strong mixing of {Y
t
} with a geometric
rate. On the other hand from Neumann (2011) or Bradley (2007) Proposition 2.8, absolute
regularity of {Y
t
} implies ergodicity of {Y
t
}. Since we can express λ
t
= f
(Y
t1
, Y
t2
, ··· )
almost surely, Proposition 2.10 (ii) in Bradley (2007) implies that the bivariate process
{Y
t
, λ
t
} is also ergodic. But unfortunately the bivariate process {Y
t
, λ
t
} is not strongly mix-
ing in general. See Neumann (2011) Remark 3 for a counter example. This is again caused
by the combination of a discrete {Y
t
} and a continuous-valued {λ
t
}. Davis and Liu (2014)
have extended this argument and the ergodicity results to their more general setting.
4.2.5 Weak Dependence
We have repeatedly pointed out the difculties in proving strong mixing for the {λ
t
} process
due to the discreteness of the {Y
t
} process. It is an obstacle to the derivation of an asymp-
totic theory of parameter estimates. Similar problems occur in other elds of statistics such
as bootstrapping where one samples from a discrete distribution. Doukhan and Louhichi
(1999) invented the concept of weak dependence to get around these difculties. The con-
cept of weak dependence was especially designed to handle situations where there is no
strong mixing.
89 Count Time Series with Observation-Driven Autoregressive Parameter Dynamics
The concept of weak dependence is perhaps best approached by consulting Doukhan
and Neumann (2008) or the monograph Dedecker et al. (2007), which is partly a survey. In
it are also mentioned alternative concepts (Bickel and Bühlmann 1999; Dedecker and Prieur
2005) that have some analogous properties to the weak dependence concept. In Doukhan
and Neumann (2008), weak dependence is dened as follows (Z is the set of all integers,
N is the set of positive integers):
A process {Y
t
}, t Z, is called ψ-weakly dependent if there exists a universal null
sequence {
r
}, r N such that for any k-tuple (s
1
, ..., s
k
) and any m-tuple (t
1
, ..., t
m
) with
s
1
··· s
k
< s
k+r
= t
1
··· t
m
and arbitrary measurable functions g : R
k
R,
h : R
m
R with ||g||
1and h
1, the following inequality is fullled:
|cov(g(X
s
1
, ..., X
s
k
), h(X
t
1
, ..., X
t
m
))|≤ ψ(k, m,Lip g,Lip h)
r
.
Here Lip h denotes the Lipschitz modulus of continuity of h,thatis,
Lip h = sup
|h(x) h(y)|
,
x=y
||x y||
l
1
where ||(z
1
, ..., z
m
)||
l
1
=
i
|z
i
|,and ψ : N
2
× R
+
2
→[0, ) is a function to be specied.
This general denition has subsequently been rened by appropriate choices of ψ yield-
ing variants of weak ψ-dependence as explained in Doukhan and Neumann (2008). It has
been shown that weak dependence generalizes classic mixing criteria in a number of situ-
ations, and that central-limit-type results can be derived in several cases where there is a
discrete innovation process such as for Markov processes driven by discrete innovations
and bootstrap versions of linear AR processes.
To show that a certain process exhibits weak dependence, a contraction argument seems
generally to be most efcient. In turn the contractivity can often be proved by a cou-
pling device. There are some applications of weak dependence to AR count processes
(Franke 2010 and Doukhan et al. 2012, 2013; see also Neumann 2011), and they show weak
dependence by such arguments.
The applications to count time series so far have mainly been limited to demonstrat-
ing that certain models, linear and some nonlinear, are weakly dependent (Doukhan et al.
2012, 2013, Franke 2010, and the related paper by Neumann 2011). Once weak depen-
dence is proved, it is still not trivial to establish an asymptotic theory that can be used
to study asymptotics of parameter estimates for the models we are considering. The paper
by Bardet and Wintenberger (2009) could be a useful starting point. This paper uses weak
dependence to obtain asymptotic normality of the quasi-maximum likelihood estimator for
a multidimensional causal process. See also Doukhan et al. (2012, 2013), and Christou and
Fokianos (2014).
Note that unlike much of the theory presented in this paper the concept of weak depen-
dence does not require a Markov framework. This is of course also the case with the mixing
concepts. Franke (2010) uses contraction to prove weak dependence for non-Markovian,
but Markov type, models where
λ
t
= d +
a
i
λ
ti
+
b
i
Y
ti
In Doukhan et al. (2012) it was conjectured that weak dependence also holds for nonlinear
ARMA(, ) models. Unfortunately, as stated in the correction Doukhan et al. (2013) of
90 Handbook of Discrete-Valued Time Series
that paper the attempted proof of this fact in Doukhan et al. (2012) is not correct. Note that
ARMA models are also treated by Woodard et al. (2011), who transform them to a Markov
framework.
4.2.6 Markov Theory with φ-Irreduciblity
Much of the theory of Meyn and Tweedie (2009) is concerned with the measure theoretic
aspects of Markov chains, and there is a huge literature on continuous-valued Markov
chains where this concept is exploited. Here we just comment very briey on these devel-
opments, partly to put the preceding developments into perspective and partly as a prelude
to the theory of the perturbed models presented next. Let us start by formally dening
φ-irreducibility:
A Markov chain is φ-irreducible if there is a nontrivial measure φ on {S, B(S)} such that
whenever φ(A)>0, then P
t
(x, A)>0 for some t = t(x, A) 1 for all x S. The measure φ
is usually assumed to be a maximal irreducibility measure, see Meyn and Tweedie (2009).
A few comments about the relationship between φ-irreducibility and the other concepts
we have discussed are in order. First, from Meyn and Tweedie (2009) Proposition 6.1.5, if
{X
t
} is strong Feller, and S contains one reachable point x
0
, then {X
t
} is φ-irreducible with
φ = P(x
0
, ·). So-called T-chains are somewhat in between weak and strong Feller chains. A
T-chain is dened by Meyn and Tweedie (2009), p. 124, as a chain containing a continuous
component (made precise by Meyn and Tweedie; see also example in Nummelin 1984,
p. 12). The continuous component denes a transition probability denoted by T,andif{X
t
}
is a T-chain and {X
t
} contains a reachable point x
0
, then {X
t
} is φ-irreducible with φ =
T(x
0
, ·). By perturbing the chains (4.1), (4.2) and (4.4), (4.5) with a random sequence having
a density absolutely continuous with respect to Lebesgue measure on some set A B(S) of
positive Lebesgue measure, then we essentially obtain a T-chain and hence φ-irreducibility,
because proving the existence of a reachable point is in general not difcult.
When φ-irreducibility holds, one can dene recurrence, positive recurrence, Harris
recurrence, and geometric ergodicity. Note that geometric ergodicity, in contradistinction
to ergodicity discussed earlier, requires φ-irreducibility. Appendix A in Meyn and Tweedie
(2009) is a useful compressed source for most of these concepts. The difference between just
having the existence of a stationary measure and having positive recurrence is highlighted
in Theorems 17.1.2 and 17.1.7 in Meyn and Tweedie (2009).
4.2.7 Perturbation Method
The φ-irreducible Markov chain theory can now be illustrated on the models (4.1), (4.2),
(4.4), (4.5) when they are perturbed as in (4.9), so that the innovations have a continuous
component, effectively making this into a T-chain. We will stick to the linear model (4.4),
(4.5) in this subsection and consider more general models in the next section on inference.
As indicated in the preceding subsection, φ-irreducibility cannot be used in (4.1), (4.2) and
(4.4), (4.5). For example, if d, a and b in (4.5) are rational numbers, then {λ
t
} will stay on
the rational numbers if λ
0
is rational, and if φ is taken to be Lebesgue measure, a nat-
ural choice, {λ
t
} is not φ-irreducible. However, by perturbing it, it may be made into a
φ-irreducible process. In the linear model case this was done in Fokianos et al. (2009) by
adding a continuous perturbation obtaining a new process {Y
m
},
t
Y
t
m
t
)
t
= d + aλ
m
t1
+ ε
t,m
, (4.18) = N
t
(λ
m
, λ
m
t1
+ bY
m
91 Count Time Series with Observation-Driven Autoregressive Parameter Dynamics
ε
t,m
= c
m
1(Y
t
m
1
= 1)U
t
, c
m
> 0, c
m
0, as m →∞, (4.19)
where 1(·) is the indicator function, and where {U
t
} is a sequence of iid uniform variables
on [0,1], and such that U
t
is independent of N
t
(·). One can then prove φ-irreducibility and
geometric ergodicity as in the proof of Lemma A1 and Proposition 2.1 in Fokianos et al.
(2009). The perturbation in (4.18) is a purely auxiliary device to obtain φ-irreducibility. The
U
t
s could be thought of as pseudo-observations generated by the uniform law. The pertur-
bation can be introduced in many other ways. It is enough to let {U
t
} be an iid sequence
of positive random variables possessing a density on the positive real axis and having a
bounded support starting at zero. In addition, as will be seen in the next section, the likeli-
hood functions for {Y
t
} and {Y
m
}, as far as dependence on {λ
t
} and {λ
m
} is concerned, will
t t
be the same for models (4.4), (4.5) and (4.18), (4.19). It will be noted that both {Y
t
} and {Y
m
}
t
can be identied with the observations in the expression for the likelihood, but they cannot
be identied as stochastic variables since they are generated by different models.
The simple condition 0 < a+b < 1 implies geometric ergodicity of the perturbed process,
and hence the existence of a stationary measure. For a nonlinear perturbed process
t1
) + f
2
(Y
m
λ
m
t
= d + f
1
(λ
m
t1
) + ε
t,m
the corresponding simple condition 0 < f
1
(λ) +f
2
(λ)<1 outside a small (often taken to be
compact) set C and bounded inside; that is a nonglobal contraction sufces. On the other
hand, it has been seen earlier that typically a global contraction is used to obtain the exis-
tence of a stationary measure. Based on this one might think that using the perturbation
approach it is possible to get away with weaker conditions when it comes to deriving an
asymptotic theory for parameter estimates. Unfortunately, this is not the case, because in
the next step we must let m →∞and c
m
0, when we want to approximate (4.4), (4.5) by
(4.18), (4.19). Then we must have λ
m
t
λ
t
and Y
t
m
Y
t
in some sense. The proof of this,
see proof of Proposition 3 in Fokianos and Tjøstheim (2012), requires stricter conditions,
essentially identical to those required in the nonperturbed approach. In the linear case the
global and local conditions are the same.
Almost all of the theory developed so far, may it be for the perturbed or nonperturbed
case, has been stated for a rst-order model. In principle it can be extended to a higher-
order model, but it is not trivial. Ferland et al. (2006) discuss wide sense stationarity for an
ARMA-type model
p q
λ
t
= d + a
i
λ
ti
+ b
i
Y
ti
. (4.20)
i=1 i=1
Davis and Liu (2014) in their Proposition 5 prove the geometric moment contracting prop-
erty and the existence of a unique stationary solution under mild regularity conditions.
Franke (2010) proves so-called θ-weak dependence and strict stationarity of {Y
t
} in the
nonlinear system
Y
t
= N
t
(λ
t
), λ
t
= g(λ
t1
, ..., λ
tp
, Y
t1
, ..., Y
tq
).
When it comes to the perturbed process of (4.20), it can be converted to a Markov vector
process. Again, geometric ergodicity can be proved with a one-component perturbation,
the appropriate condition being 0 <
i
a
i
+
i
b
i
< 1.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset