4: Count Time Series with Observation-Driven Autoregressive Parameter Dynamics (4/5)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

92 Handbook of Discrete-Valued Time Series

4.3 Statistical Inference

In Section 4.2, I have outlined two main approaches to the theory of INGAR models. In one

approach one studies the properties of the model itself, and a number of results on the prob-

abilistic properties, in particular on the existence of a stationary measure, are now available.

The lack of φ-irreducibility (and hence geometric ergodicity) do create some problems.

The following quote by Woodard et al. (2011) may illustrate the problem of bridging the

probabilistic results on the existence of a stationary measure with the convergence results

needed in asymptotic theory. They state that the existence of a stationary measure “lay the

foundation for showing convergence of time averages for a broad class of functions, and

asymptotic properties of maximum likelihood estimators. However, these results are not

immediate. For instance, laws of large numbers do exist for non-φ-irreducible stationary

processes (cf. again Meyn and Tweedie 2009, Theorem 17.1.2), and show that the averages

of bounded functionals converge. However, the value to which they converge may depend

on the initialization of the process. (It may be possible to obtain correct limits of time aver-

ages by restricting the class of functions under consideration, or by obtaining additional

mixing results for the time series under consideration).”

I am going to review the work on statistical inference for the direct approach and the per-

turbed approach in Sections 4.3.1 and 4.3.2, respectively, but rst let me give some general

comments on possible advantages and drawbacks of the two methods.

An obvious advantage of the direct method is that it only uses properties of the process

dened by the original model itself. A possible drawback is assuming that the process is

in its stationary state, granted that conditions for existence of a stationary state is fullled.

This makes it possible to extend the process to 0, ± 1, ± 2, .... It is not always straightfor-

ward to link a likelihood based on a stationary solution to a likelihood depending on a

given initial condition. An approximation argument seems to be needed as in Wang et al.

(2014). In the perturbation method the process can be started from an arbitrary initial point,

because when the process is perturbed, one typically obtains a φ-irreducible geometric

ergodic process, and the inference theory for such processes does not require the process to

be in a stationary state. The geometric ergodicity drives the process towards its stationary

state asymptotically at a geometric rate. A disadvantage of this approach is the mere fact

that the perturbed process is just an intermediate step and a different, but in some cases

similar, approximation argument to that mentioned earlier for the likelihoods is needed.

Both methods require a type of contracting condition. When it comes to the problem of

estimating parameters, the maximum likelihood estimates for the two methods are usually

identical, but in deriving properties of the estimates the methods may differ and require

arguments of different complexity.

4.3.1 Asymptotic Estimation Theory without Perturbation

This is a new topic, so the literature is not extensive. The parameter estimation problem

has been treated by Davis and Liu (2014), Wang et al. (2014), Douc et al. (2013), Woodard

et al. (2011), and Christou and Fokianos (2014). These authors have somewhat different

choices of methods, but the main methodological difference seems to consist in the way the

contraction condition is established. Once this is established, fairly standard consistency

and likelihood arguments are used. Other aspects of statistical inference are treated by Wu

and Shao (2004), Fokianos and Neumann (2013), Fokianos and Fried (2010), Christou and













93 Count Time Series with Observation-Driven Autoregressive Parameter Dynamics

Fokianos (2013, 2015). I will concentrate on the parameter estimation here, and then briey

mention other aspects.

Davis and Liu (2014) treat a general model

t−1

∼ p(y|η

), X

= g

t−1

, Y

t−1

)

where F

= σ{η

, Y

, ....Y

}, X

= E(Y

t−1

) =

B(η

),andp(·|η) is a distribution from a one-

parameter exponential family

p(y|η) = exp{ηy − A(η)}h(y),

where η is the natural parameter (e.g., ln λ in a Poisson distribution), and A(η) and h(y) are

known functions. Many familiar distributions belong to this family, such as the Poisson,

negative binomial, Bernoulli, exponential. In Davis and Liu (2014) the Poisson case and the

negative binomial with parameters r and p

are treated. For the latter the integer parame-

ter r is xed, and the probability parameter p

is allowed to be stochastic and recursively

updated. Using a contraction condition on g and the backward process of Diaconis and

Freedman (1999), they establish the existence of a unique stationary measure and prove

that the process {X

} is geometric moment contracting, as in Wu and Shao (2004), and this

in turn is used to establish that {Y

} is absolutely regular and {X

, Y

} is ergodic, as did

Neumann (2011) using other means. The likelihood function is given by

L(θ|Y

, ..., Y

; η

) = exp{Y

− A(η

(θ))}h(Y

t=1

Taking logs and differentiating the score function is obtained

∂l(θ)

∂η

(θ) =

− B(η

(θ)}

∂θ ∂θ

t=1

where B is the derivative of A,that is B(η) = A

(η).Note that B

(η) = var(Y

)> 0so

that B(η) is strictly increasing. Moreover, X

= B(η

) = E(Y

t−1

), η

= B

−1

),and

= g

∞

t−1

, ...)by expanding {Y

} back to −∞ as explained in Section 4.2.4. If we let



be a solution of S

(θ) = 0, then under a string of regularity conditions given in Davis and

Liu (2014), strong consistency is obtained and asymptotic normality of θ

is demonstrated,

that is

√



→ N (0, 

−1

),θ

− θ)

where  = E







∂



∂







and where



is used to denote the transposed. In the Poisson

case η

= ln λ

and



exp(−λ

)λ

(θ)

∂η

(θ)

∂η

(θ)





L(θ) =

, A = e

(θ)

,and  = E e

! ∂θ ∂θ

which corresponds to the asymptotic distribution in the log linear case in Fokianos and

Tjøstheim (2011).

94 Handbook of Discrete-Valued Time Series

Consistency is proved using standard arguments, whereas asymptotic normality is

proved by a linearization argument and the fact that n

−1/2

− B(η

(θ))}∂η

/∂θ is a

martingale difference sequence.

Christou and Fokianos (2014) look at the negative binomial case too, and they approach

the problem somewhat differently. They use the fact that an appropriate mixture of Poisson

distributions yields a negative binomial distribution, and obtain a one-parameter negative

binomial distribution in that way. Subsequently, this is used in treating overdispersion in

a natural manner.

Somewhat different arguments to those employed by Davis and Liu (2014) were used

in Wang et al. (2014) in a threshold-like model. The existence of a stationary measure is

established using the weak Feller and e-chain properties.

Douc et al. (2013) restrict themselves to proving consistency in a model

t−1

∼ H(X

t−1

, ·), X

= F

t−1

where H is a Markov kernel and F

= σ(X

, Y

, s ≤ t, s ∈ N ). The authors have a series of

regularity conditions, among them the asymptotic strong Feller property (Hairer and Mat-

tingly 2006), existence of a reachable point and a contraction condition set up in a coupling

context. They use these conditions to establish an invariant unique measure for the Markov

kernel under consideration.

4.3.2 The Perturbation Approach

This approach is carried out in Fokianos et al. (2009), Fokianos and Tjøstheim (2011, 2012)

and referred to in Woodard et al. (2011). In those papers it is restricted to the Poisson case

for the distribution of the innovations, but I believe that it has wider potential. In the linear

case, the essence of the method is to set up a perturbed model (4.18), (4.19) in addition to the

original model (4.4), (4.5) to obtain a model that is geometrically ergodic and analogously

for the log linear and nonlinear models. Next, this is used to construct two likelihoods, and

then to show that their difference tends to zero as the size of the perturbation decreases to

zero. This is nally utilized to show that the asymptotics of the parameter estimates of the

original model are obtained as limits of the asymptotics of the perturbed model.

We will illustrate this for the nonlinear model of Fokianos and Tjøstheim (2012):

= N

(λ

), λ

= f

(λ

t−1

) + f

t−1

)

with assumptions on f

and f

as already stated in Section 4.2.7. The corresponding

perturbed model is given by

= N

(λ

), λ

= f

(λ

t−1

) + f

−1

) + ε

t,m

= c

1(Y

= 1)U

, c

> 0, U

∼ iid U[0, 1],

where c

→ 0as m →∞, and where other possibilities of perturbation exist.

The likelihoods are given by



exp(−λ

(θ))λ

(θ)

L(θ) =

t=1

  

95 Count Time Series with Observation-Driven Autoregressive Parameter Dynamics

and

n n



exp(−λ

(θ))(λ

(θ))



(θ) = f

)

t=1

by the Poisson assumption and the assumed independence of U

from (Y

−1

, λ

t−1

) with

(u) denoting the uniform density. Moreover, the log likelihood and the scores are

given by

n n m

(θ) = (Y

ln λ

(θ) − λ

(θ)) + ln f

) − ln Y

t=1 t=1 t=1

and





(θ) =

∂l

∂θ

(θ)



(

θ)

− 1

∂λ

∂

(θ)

t=1

The Hessian is given by



∂

(θ)

(θ) =−

∂θ∂θ



t=1



∂λ

(θ)



(θ)









∂

(θ)

∂λ



−

− 1

(λ

(θ))

∂θ ∂θ

(θ) ∂θ∂θ



t=1 t=1

Using E(Y

, the expectation of

is|F

t−1

) = λ



 





∂λ

(θ) = E

t t

∂θ ∂θ

Exactly the same calculation can be carried through for the unperturbed system,

resulting in



 



∂λ



G(θ) = E

∂θ ∂θ

Asymptotic normality for the parameter estimates of the perturbed system follows from

standard asymptotic theory. Asymptotic normality with covariance matrix G

−1

for the

unperturbed system will follow from an approximation theorem of Brockwell and Davis

(1987), Proposition 6.3.9, if it can be shown that G

(θ) converges to G(θ) when m →∞

and the perturbation tends to zero as c

→ 0. It is in this argument that the contraction

assumption is needed. One needs to evaluate differences



 



 





∂λ



∂λ



−



and E



t t

−



∂θ ∂θ



∂θ ∂θ λ

∂θ ∂θ



96 Handbook of Discrete-Valued Time Series

and using the regularity conditions of Fokianos and Tjøstheim (2012) these are transformed

to showing

E|λ

− λ

|→0, E(λ

− λ

)

→ 0,

E|Y

− Y

|→0, E(Y

− Y

)

→ 0,

and almost surely

|λ

− λ

|→0, |Y

− Y

|→0.

For a proof of these we refer to Fokianos and Tjøstheim (2012).

Similar techniques can be used in the log linear case, where

= N

(λ

), ν

= d + aν

t−1

+ b ln(Y

t−1

+ 1), λ

= exp(ν

In this case there is a large discrepancy between the conditions required for approximating

the perturbed system to the unperturbed system and the conditions required for geometric

ergodicity of

= N

(λ

), ν

= d + aν

t−1

+ b ln(Y

−1

+ 1) + ε

t,m

where ν

, Y

are xed and {ε

t,m

} is as in (4.19). For geometric ergodicity it is sufcient that

|a| < 1 and in addition if b > 0, then |a+b|< 1, and when b < 0, |a||a+b|< 1. These conditions

are much milder than the corresponding condition |a + b| < 1 for the linear system (4.4),

(4.5). This is because for the log linear system the process can ip back and forth between the

negative and positive domain. To obtain global contraction, where the perturbed likelihood

can be compared to the unperturbed one, much stricter conditions are required. In Fokianos

and Tjøstheim (2012), a set of sufcient conditions were obtained by requiring |a + b| < 1

if a and b have the same sign and (a

)<1if a and b have different signs. In Douc et al.

(2013) who did not consider the asymptotic distribution but did consider contraction, this

is improved, and they obtained the sufcient condition |a + b|∨|a|∨|b| < 1.

The perturbation approach is also possible for non-Poisson processes. For the model

considered by Davis and Liu (2014), for example, one must show that the geometric

moment contractivity is enough to prove that the unperturbed likelihood can be approx-

imated by a perturbed likelihood. I conjecture that this is the case under weak regularity

conditions.

Finally, summing up all this, the perturbation approach and the direct unperturbed

approach are not mutually exclusive. The direct approach gives the fundamental proba-

bility properties of the process. Geometric ergodicity is not present, and, at least in a time

series sense, requires a nonstandard Markov chain theory. The perturbation approach is

only used in the inference stage. Asymptotic results are rst found for the perturbed sys-

tem. The geometric ergodicity means that the process is not required to be in its stationary

state, and in a time series sense more standard Markov theory can be used.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 4: Count Time Series with Observation-Driven Autoregressive Parameter Dynamics (4/5)

Create new playlist

Sign In

Sign Up

Table of Contents for
4: Count Time Series with Observation-Driven Autoregressive Parameter Dynamics (4/5)