4: Count Time Series with Observation-Driven Autoregressive Parameter Dynamics (3/5)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

87 Count Time Series with Observation-Driven Autoregressive Parameter Dynamics

4.2.3 Coupling Arguments

An alternative but overlapping approach to proving existence of a stationary measure pro-

ceeds via coupling arguments. This is the approach of Neumann (2011). See also Franke

(2010). Neumann works with the quite general nonlinear recursive system,

−

,λ

≡ Poisson(λ

), λ

= f(λ

t−1

, Y

t−1

), (4.16)

which clearly is also related to the iterative system (4.10). The function f is supposed to

satisfy the contractive condition,

|f (λ, y) − f (λ



, y



)|≤α

|λ − λ



|+α

|y − y



|, (4.17)

so that α = α

+α

< 1. This is essentially the sort of condition mentioned for the nonlinear

additive system (4.15), and the same method as was used to prove the weak Feller property

for the linear system can be used here, whereas Chebyshev’s inequality can be used to

prove boundedness in probability (tightness). Theorem 12.1.2 (ii) of Meyn and Tweedie

(2009) then implies the existence of a stationary measure. (Note that this is different from

Theorem 4.1.)

If we let Q

be the conditional distribution of (Y

, λ

) given λ

= x, then {Q

, t ∈ N }

is tight. Hence, there is a subsequence {t

, k ∈ N } such that Q

converges weakly to some

probability measure π

as k →∞. Coupling arguments are now used to show that this limit

does not depend on the starting value x, and that the full sequence {Q

, t ∈ N } converges.

This implies the existence of a unique stationary distribution.

A brief indication of the coupling argument is as follows: In addition to Y

, we construct

another process Y



with another starting value y, but other than that its probability gener-

ating mechanism is identical to that of Y

. Using general properties of the Poisson process

and the contractivity condition, one obtains

E(|λ

− λ



||x, y) ≤ α

|x − y|

and

E(|Y

− Y



||x, y) ≤ α

|x − y|

and tightness arguments again can be used to complete the proof. Douc et al. (2013)

use coupling techniques in addition to the asymptotic strong Feller criterion of Woodard

et al. (2011).

4.2.4 Ergodicity

Existence of a stationary measure is not enough to develop an appropriate law of large num-

bers and asymptotic distribution theory of parameter estimates. Ergodicity is needed for

consistency, and often ergodicity with an additional rate of convergence to the stationary

distribution, such as geometric ergodicity, is required. This in turn can be linked to mixing

and mixing rates and ultimately to central limit results. Since it turns out that strong mixing

cannot in general be established for the models we look at, other mixing concepts such as

weak dependence, to be dened in Section 4.2.5, may be useful. Alternatively, perturbation

arguments can be used to obtain strong mixing.

88 Handbook of Discrete-Valued Time Series

In the following, we will just state a few facts about ergodicity and mixing in our context.

Ergodicity is of course treated at length in the book by Meyn and Tweedie for many types of

Markov chains, but most of this material refers to the measure theoretic framework where

irreducibility is available, and not so much to our more general situation, where the dis-

creteness of the observations {Y

} used as innovations creates difculties. Here, we will look

at the models given by (4.1), (4.2), and (4.4), (4.5) and then in Section 4.2.7 the perturbed

class of models, where ergodicity is much easier to handle.

It should be noticed that “ergodicity” is used with somewhat different meanings in

dynamic system theory and in Markov chain theory. In dynamic system theory a strictly

stationary process is ergodic if an invariant set for the shift operator either has probabil-

ity 0 or 1. This is the sense in which ergodicity is used in Douc et al. (2013), Theorem 3.2.

Also, this is how it is mostly used in time series, possibly non-Markovian, and where it is

employed to prove almost sure convergence of parameter estimates. The basis is of course

Birkhoff’s famous ergodic theorem, Birkhoff (1931), providing a law of large numbers for

ergodic strictly stationary processes.

In Markov chain theory, however, ergodicity is used to characterize the convergence of

the t-step probability transition measure P

(x, dy) to π(dy), assuming that there exists an

invariant measure π. Ergodicity for a Markov chain can then be dened as

lim

(x, y) − π(y)|= 0

t→∞

pointwise in the countable state space case, and

lim

||P

(x, ·) − π(·)|| = 2 lim sup |P

(x, A) − π(A)|= 0,

t→∞ t→∞

A∈B(S)

where ||·|| is the total variation norm in the general state space case. One can prove ergod-

icity for (Y

, λ

) for the quite general system (4.16). The line of argument is as follows,

Neumann (2011): It is assumed that (4.16), (4.17) hold and such that a stationary invariant

measure exists and such that {Y

, λ

} is in its stationary regime. The count process {Y

} is

absolutely regular. It is well known that this implies strong mixing of {Y

} with a geometric

rate. On the other hand from Neumann (2011) or Bradley (2007) Proposition 2.8, absolute

regularity of {Y

} implies ergodicity of {Y

}. Since we can express λ

= f

∞

t−1

, Y

t−2

, ··· )

almost surely, Proposition 2.10 (ii) in Bradley (2007) implies that the bivariate process

, λ

} is also ergodic. But unfortunately the bivariate process {Y

, λ

} is not strongly mix-

ing in general. See Neumann (2011) Remark 3 for a counter example. This is again caused

by the combination of a discrete {Y

} and a continuous-valued {λ

}. Davis and Liu (2014)

have extended this argument and the ergodicity results to their more general setting.

4.2.5 Weak Dependence

We have repeatedly pointed out the difculties in proving strong mixing for the {λ

} process

due to the discreteness of the {Y

} process. It is an obstacle to the derivation of an asymp-

totic theory of parameter estimates. Similar problems occur in other elds of statistics such

as bootstrapping where one samples from a discrete distribution. Doukhan and Louhichi

(1999) invented the concept of weak dependence to get around these difculties. The con-

cept of weak dependence was especially designed to handle situations where there is no

strong mixing.



 

89 Count Time Series with Observation-Driven Autoregressive Parameter Dynamics

The concept of weak dependence is perhaps best approached by consulting Doukhan

and Neumann (2008) or the monograph Dedecker et al. (2007), which is partly a survey. In

it are also mentioned alternative concepts (Bickel and Bühlmann 1999; Dedecker and Prieur

2005) that have some analogous properties to the weak dependence concept. In Doukhan

and Neumann (2008), weak dependence is dened as follows (Z is the set of all integers,

N is the set of positive integers):

A process {Y

}, t ∈ Z, is called ψ-weakly dependent if there exists a universal null

sequence {

}, r ∈ N such that for any k-tuple (s

, ..., s

) and any m-tuple (t

, ..., t

) with

≤ ··· ≤ s

< s

k+r

= t

≤ ··· ≤ t

and arbitrary measurable functions g : R

→ R,

h : R

→ R with ||g||

∞

≤ 1and h

∞

≤ 1, the following inequality is fullled:

|cov(g(X

, ..., X

), h(X

, ..., X

))|≤ ψ(k, m,Lip g,Lip h)

Here Lip h denotes the Lipschitz modulus of continuity of h,thatis,

Lip h = sup

|h(x) − h(y)|

x=y

||x − y||

where ||(z

, ..., z

)||

|,and ψ : N

× R

→[0, ∞) is a function to be specied.

This general denition has subsequently been rened by appropriate choices of ψ yield-

ing variants of weak ψ-dependence as explained in Doukhan and Neumann (2008). It has

been shown that weak dependence generalizes classic mixing criteria in a number of situ-

ations, and that central-limit-type results can be derived in several cases where there is a

discrete innovation process such as for Markov processes driven by discrete innovations

and bootstrap versions of linear AR processes.

To show that a certain process exhibits weak dependence, a contraction argument seems

generally to be most efcient. In turn the contractivity can often be proved by a cou-

pling device. There are some applications of weak dependence to AR count processes

(Franke 2010 and Doukhan et al. 2012, 2013; see also Neumann 2011), and they show weak

dependence by such arguments.

The applications to count time series so far have mainly been limited to demonstrat-

ing that certain models, linear and some nonlinear, are weakly dependent (Doukhan et al.

2012, 2013, Franke 2010, and the related paper by Neumann 2011). Once weak depen-

dence is proved, it is still not trivial to establish an asymptotic theory that can be used

to study asymptotics of parameter estimates for the models we are considering. The paper

by Bardet and Wintenberger (2009) could be a useful starting point. This paper uses weak

dependence to obtain asymptotic normality of the quasi-maximum likelihood estimator for

a multidimensional causal process. See also Doukhan et al. (2012, 2013), and Christou and

Fokianos (2014).

Note that unlike much of the theory presented in this paper the concept of weak depen-

dence does not require a Markov framework. This is of course also the case with the mixing

concepts. Franke (2010) uses contraction to prove weak dependence for non-Markovian,

but Markov type, models where

= d +

t−i

In Doukhan et al. (2012) it was conjectured that weak dependence also holds for nonlinear

ARMA(∞, ∞) models. Unfortunately, as stated in the correction Doukhan et al. (2013) of

90 Handbook of Discrete-Valued Time Series

that paper the attempted proof of this fact in Doukhan et al. (2012) is not correct. Note that

ARMA models are also treated by Woodard et al. (2011), who transform them to a Markov

framework.

4.2.6 Markov Theory with φ-Irreduciblity

Much of the theory of Meyn and Tweedie (2009) is concerned with the measure theoretic

aspects of Markov chains, and there is a huge literature on continuous-valued Markov

chains where this concept is exploited. Here we just comment very briey on these devel-

opments, partly to put the preceding developments into perspective and partly as a prelude

to the theory of the perturbed models presented next. Let us start by formally dening

φ-irreducibility:

A Markov chain is φ-irreducible if there is a nontrivial measure φ on {S, B(S)} such that

whenever φ(A)>0, then P

(x, A)>0 for some t = t(x, A) ≥ 1 for all x ∈ S. The measure φ

is usually assumed to be a maximal irreducibility measure, see Meyn and Tweedie (2009).

A few comments about the relationship between φ-irreducibility and the other concepts

we have discussed are in order. First, from Meyn and Tweedie (2009) Proposition 6.1.5, if

} is strong Feller, and S contains one reachable point x

, then {X

} is φ-irreducible with

φ = P(x

, ·). So-called T-chains are somewhat in between weak and strong Feller chains. A

T-chain is dened by Meyn and Tweedie (2009), p. 124, as a chain containing a continuous

component (made precise by Meyn and Tweedie; see also example in Nummelin 1984,

p. 12). The continuous component denes a transition probability denoted by T,andif{X

}

is a T-chain and {X

} contains a reachable point x

, then {X

} is φ-irreducible with φ =

T(x

, ·). By perturbing the chains (4.1), (4.2) and (4.4), (4.5) with a random sequence having

a density absolutely continuous with respect to Lebesgue measure on some set A ∈ B(S) of

positive Lebesgue measure, then we essentially obtain a T-chain and hence φ-irreducibility,

because proving the existence of a reachable point is in general not difcult.

When φ-irreducibility holds, one can dene recurrence, positive recurrence, Harris

recurrence, and geometric ergodicity. Note that geometric ergodicity, in contradistinction

to ergodicity discussed earlier, requires φ-irreducibility. Appendix A in Meyn and Tweedie

(2009) is a useful compressed source for most of these concepts. The difference between just

having the existence of a stationary measure and having positive recurrence is highlighted

in Theorems 17.1.2 and 17.1.7 in Meyn and Tweedie (2009).

4.2.7 Perturbation Method

The φ-irreducible Markov chain theory can now be illustrated on the models (4.1), (4.2),

(4.4), (4.5) when they are perturbed as in (4.9), so that the innovations have a continuous

component, effectively making this into a T-chain. We will stick to the linear model (4.4),

(4.5) in this subsection and consider more general models in the next section on inference.

As indicated in the preceding subsection, φ-irreducibility cannot be used in (4.1), (4.2) and

(4.4), (4.5). For example, if d, a and b in (4.5) are rational numbers, then {λ

} will stay on

the rational numbers if λ

is rational, and if φ is taken to be Lebesgue measure, a nat-

ural choice, {λ

} is not φ-irreducible. However, by perturbing it, it may be made into a

φ-irreducible process. In the linear model case this was done in Fokianos et al. (2009) by

adding a continuous perturbation obtaining a new process {Y

)

= d + aλ

t−1

+ ε

t,m

, (4.18) = N

(λ

, λ

t−1

+ bY

 



91 Count Time Series with Observation-Driven Autoregressive Parameter Dynamics

t,m

= c

1(Y

−1

= 1)U

, c

> 0, c

→ 0, as m →∞, (4.19)

where 1(·) is the indicator function, and where {U

} is a sequence of iid uniform variables

on [0,1], and such that U

is independent of N

(·). One can then prove φ-irreducibility and

geometric ergodicity as in the proof of Lemma A1 and Proposition 2.1 in Fokianos et al.

(2009). The perturbation in (4.18) is a purely auxiliary device to obtain φ-irreducibility. The

s could be thought of as pseudo-observations generated by the uniform law. The pertur-

bation can be introduced in many other ways. It is enough to let {U

} be an iid sequence

of positive random variables possessing a density on the positive real axis and having a

bounded support starting at zero. In addition, as will be seen in the next section, the likeli-

hood functions for {Y

} and {Y

}, as far as dependence on {λ

} and {λ

} is concerned, will

t t

be the same for models (4.4), (4.5) and (4.18), (4.19). It will be noted that both {Y

} and {Y

}

can be identied with the observations in the expression for the likelihood, but they cannot

be identied as stochastic variables since they are generated by different models.

The simple condition 0 < a+b < 1 implies geometric ergodicity of the perturbed process,

and hence the existence of a stationary measure. For a nonlinear perturbed process

t−1

) + f

= d + f

(λ

t−1

) + ε

t,m

the corresponding simple condition 0 < f

(λ) +f

(λ)<1 outside a small (often taken to be

compact) set C and bounded inside; that is a nonglobal contraction sufces. On the other

hand, it has been seen earlier that typically a global contraction is used to obtain the exis-

tence of a stationary measure. Based on this one might think that using the perturbation

approach it is possible to get away with weaker conditions when it comes to deriving an

asymptotic theory for parameter estimates. Unfortunately, this is not the case, because in

the next step we must let m →∞and c

→ 0, when we want to approximate (4.4), (4.5) by

(4.18), (4.19). Then we must have λ

→ λ

and Y

→ Y

in some sense. The proof of this,

see proof of Proposition 3 in Fokianos and Tjøstheim (2012), requires stricter conditions,

essentially identical to those required in the nonperturbed approach. In the linear case the

global and local conditions are the same.

Almost all of the theory developed so far, may it be for the perturbed or nonperturbed

case, has been stated for a rst-order model. In principle it can be extended to a higher-

order model, but it is not trivial. Ferland et al. (2006) discuss wide sense stationarity for an

ARMA-type model

p q

= d + a

t−i

+ b

t−i

. (4.20)

i=1 i=1

Davis and Liu (2014) in their Proposition 5 prove the geometric moment contracting prop-

erty and the existence of a unique stationary solution under mild regularity conditions.

Franke (2010) proves so-called θ-weak dependence and strict stationarity of {Y

} in the

nonlinear system

= N

(λ

), λ

= g(λ

t−1

, ..., λ

t−p

, Y

t−1

, ..., Y

t−q

When it comes to the perturbed process of (4.20), it can be converted to a Markov vector

process. Again, geometric ergodicity can be proved with a one-component perturbation,

the appropriate condition being 0 <

< 1.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 4: Count Time Series with Observation-Driven Autoregressive Parameter Dynamics (3/5)

Create new playlist

Sign In

Sign Up

Table of Contents for
4: Count Time Series with Observation-Driven Autoregressive Parameter Dynamics (3/5)