92 Handbook of Discrete-Valued Time Series
4.3 Statistical Inference
In Section 4.2, I have outlined two main approaches to the theory of INGAR models. In one
approach one studies the properties of the model itself, and a number of results on the prob-
abilistic properties, in particular on the existence of a stationary measure, are now available.
The lack of φ-irreducibility (and hence geometric ergodicity) do create some problems.
The following quote by Woodard et al. (2011) may illustrate the problem of bridging the
probabilistic results on the existence of a stationary measure with the convergence results
needed in asymptotic theory. They state that the existence of a stationary measure “lay the
foundation for showing convergence of time averages for a broad class of functions, and
asymptotic properties of maximum likelihood estimators. However, these results are not
immediate. For instance, laws of large numbers do exist for non-φ-irreducible stationary
processes (cf. again Meyn and Tweedie 2009, Theorem 17.1.2), and show that the averages
of bounded functionals converge. However, the value to which they converge may depend
on the initialization of the process. (It may be possible to obtain correct limits of time aver-
ages by restricting the class of functions under consideration, or by obtaining additional
mixing results for the time series under consideration).”
I am going to review the work on statistical inference for the direct approach and the per-
turbed approach in Sections 4.3.1 and 4.3.2, respectively, but rst let me give some general
comments on possible advantages and drawbacks of the two methods.
An obvious advantage of the direct method is that it only uses properties of the process
dened by the original model itself. A possible drawback is assuming that the process is
in its stationary state, granted that conditions for existence of a stationary state is fullled.
This makes it possible to extend the process to 0, ± 1, ± 2, .... It is not always straightfor-
ward to link a likelihood based on a stationary solution to a likelihood depending on a
given initial condition. An approximation argument seems to be needed as in Wang et al.
(2014). In the perturbation method the process can be started from an arbitrary initial point,
because when the process is perturbed, one typically obtains a φ-irreducible geometric
ergodic process, and the inference theory for such processes does not require the process to
be in a stationary state. The geometric ergodicity drives the process towards its stationary
state asymptotically at a geometric rate. A disadvantage of this approach is the mere fact
that the perturbed process is just an intermediate step and a different, but in some cases
similar, approximation argument to that mentioned earlier for the likelihoods is needed.
Both methods require a type of contracting condition. When it comes to the problem of
estimating parameters, the maximum likelihood estimates for the two methods are usually
identical, but in deriving properties of the estimates the methods may differ and require
arguments of different complexity.
4.3.1 Asymptotic Estimation Theory without Perturbation
This is a new topic, so the literature is not extensive. The parameter estimation problem
has been treated by Davis and Liu (2014), Wang et al. (2014), Douc et al. (2013), Woodard
et al. (2011), and Christou and Fokianos (2014). These authors have somewhat different
choices of methods, but the main methodological difference seems to consist in the way the
contraction condition is established. Once this is established, fairly standard consistency
and likelihood arguments are used. Other aspects of statistical inference are treated by Wu
and Shao (2004), Fokianos and Neumann (2013), Fokianos and Fried (2010), Christou and
93 Count Time Series with Observation-Driven Autoregressive Parameter Dynamics
Fokianos (2013, 2015). I will concentrate on the parameter estimation here, and then briey
mention other aspects.
Davis and Liu (2014) treat a general model
Y
t
|F
t1
p(y|η
t
), X
t
= g
θ
(X
t1
, Y
t1
)
where F
t
= σ{η
1
, Y
1
, ....Y
t
}, X
t
= E(Y
t
|F
t1
) =
.
B(η
t
),andp(·|η) is a distribution from a one-
parameter exponential family
p(y|η) = exp{ηy A(η)}h(y),
where η is the natural parameter (e.g., ln λ in a Poisson distribution), and A(η) and h(y) are
known functions. Many familiar distributions belong to this family, such as the Poisson,
negative binomial, Bernoulli, exponential. In Davis and Liu (2014) the Poisson case and the
negative binomial with parameters r and p
t
are treated. For the latter the integer parame-
ter r is xed, and the probability parameter p
t
is allowed to be stochastic and recursively
updated. Using a contraction condition on g and the backward process of Diaconis and
Freedman (1999), they establish the existence of a unique stationary measure and prove
that the process {X
t
} is geometric moment contracting, as in Wu and Shao (2004), and this
in turn is used to establish that {Y
t
} is absolutely regular and {X
t
, Y
t
} is ergodic, as did
Neumann (2011) using other means. The likelihood function is given by
n
L(θ|Y
1
, ..., Y
n
; η
1
) = exp{Y
t
A(η
t
(θ))}h(Y
t
).
t=1
Taking logs and differentiating the score function is obtained
l(θ)
n
η
t
S
n
(θ) =
=
{Y
t
B(η
t
(θ)}
,
θ θ
t=1
where B is the derivative of A,that is B(η) = A
˙
(η).Note that B
˙
(η) = var(Y
t
)> 0so
that B(η) is strictly increasing. Moreover, X
t
= B(η
t
) = E(Y
t
|F
t1
), η
t
= B
1
(X
t
),and
X
t
= g
θ
(Y
t1
, ...)by expanding {Y
t
} back to −∞ as explained in Section 4.2.4. If we let
θ
n
be a solution of S
n
(θ) = 0, then under a string of regularity conditions given in Davis and
Liu (2014), strong consistency is obtained and asymptotic normality of θ
n
is demonstrated,
that is
n(
N (0,
1
),θ
n
θ)
L
where = E
B
˙
η
t
η
θ
t
η
θ
t
and where
is used to denote the transposed. In the Poisson
case η
t
= ln λ
t
and
exp(λ
t
)λ
t
Y
t
(θ)
η
t
(θ)
η
t
(θ)
η
t
(θ)
L(θ) =
, A = e
η
t
(θ)
,and = E e
Y
t
! θ θ
t
which corresponds to the asymptotic distribution in the log linear case in Fokianos and
Tjøstheim (2011).
94 Handbook of Discrete-Valued Time Series
Consistency is proved using standard arguments, whereas asymptotic normality is
proved by a linearization argument and the fact that n
1/2
{Y
t
B(η
t
(θ))}η
t
/∂θ is a
martingale difference sequence.
Christou and Fokianos (2014) look at the negative binomial case too, and they approach
the problem somewhat differently. They use the fact that an appropriate mixture of Poisson
distributions yields a negative binomial distribution, and obtain a one-parameter negative
binomial distribution in that way. Subsequently, this is used in treating overdispersion in
a natural manner.
Somewhat different arguments to those employed by Davis and Liu (2014) were used
in Wang et al. (2014) in a threshold-like model. The existence of a stationary measure is
established using the weak Feller and e-chain properties.
Douc et al. (2013) restrict themselves to proving consistency in a model
Y
t
|F
t1
H(X
t1
, ·), X
t
= F
Y
t
(X
t1
),
where H is a Markov kernel and F
t
= σ(X
s
, Y
s
, s t, s N ). The authors have a series of
regularity conditions, among them the asymptotic strong Feller property (Hairer and Mat-
tingly 2006), existence of a reachable point and a contraction condition set up in a coupling
context. They use these conditions to establish an invariant unique measure for the Markov
kernel under consideration.
4.3.2 The Perturbation Approach
This approach is carried out in Fokianos et al. (2009), Fokianos and Tjøstheim (2011, 2012)
and referred to in Woodard et al. (2011). In those papers it is restricted to the Poisson case
for the distribution of the innovations, but I believe that it has wider potential. In the linear
case, the essence of the method is to set up a perturbed model (4.18), (4.19) in addition to the
original model (4.4), (4.5) to obtain a model that is geometrically ergodic and analogously
for the log linear and nonlinear models. Next, this is used to construct two likelihoods, and
then to show that their difference tends to zero as the size of the perturbation decreases to
zero. This is nally utilized to show that the asymptotics of the parameter estimates of the
original model are obtained as limits of the asymptotics of the perturbed model.
We will illustrate this for the nonlinear model of Fokianos and Tjøstheim (2012):
Y
t
= N
t
(λ
t
), λ
t
= f
1
(λ
t1
) + f
2
(Y
t1
)
with assumptions on f
1
and f
2
as already stated in Section 4.2.7. The corresponding
perturbed model is given by
Y
t
m
= N
t
(λ
t
m
), λ
t
m
= f
1
(λ
m
t1
) + f
2
(Y
t
m
1
) + ε
t,m
,
ε
t,m
= c
m
1(Y
m
= 1)U
t
, c
m
> 0, U
t
iid U[0, 1],
t
where c
m
0as m →∞, and where other possibilities of perturbation exist.
The likelihoods are given by
n
Y
t
exp(λ
t
(θ))λ
t
(θ)
L(θ) =
Y
t
!
t=1
95 Count Time Series with Observation-Driven Autoregressive Parameter Dynamics
and
n n
exp(λ
m
t
(θ))(λ
m
t
(θ))
Y
t
m
L
m
(θ) = f
U
(U
t
)
Y
m
!
t=1
t
t=1
by the Poisson assumption and the assumed independence of U
t
from (Y
t
m
1
, λ
m
t1
) with
f
U
(u) denoting the uniform density. Moreover, the log likelihood and the scores are
given by
n n m
l
m
(θ) = (Y
t
m
ln λ
m
t
(θ) λ
t
m
(θ)) + ln f
U
(U
t
) ln Y
t
m
t=1 t=1 t=1
and
n
S
m
n
(θ) =
l
m
θ
(θ)
=
λ
m
t
Y
(
t
m
θ)
1
λ
m
t
θ
(θ)
.
t=1
The Hessian is given by
n
2
l
m
(θ)
H
m
(θ) =−
n
θθ
t=1
n
Y
m
λ
m
t
(θ)
t
(θ)
n
Y
m
2
λ
m
t
(θ)
λ
m
=
t
t
1
.
λ
m
(λ
m
t
(θ))
2
θ θ
t
(θ) θθ
t=1 t=1
Using E(Y
t
m
t
, the expectation of
n
1
H
m
is|F
t1
) = λ
m
n

1
λ
m
λ
m
G
m
(θ) = E
t t
.
λ
m
t
θ θ
Exactly the same calculation can be carried through for the unperturbed system,
resulting in

1
λ
t
λ
t
G(θ) = E
.
λ
t
θ θ
Asymptotic normality for the parameter estimates of the perturbed system follows from
standard asymptotic theory. Asymptotic normality with covariance matrix G
1
for the
unperturbed system will follow from an approximation theorem of Brockwell and Davis
(1987), Proposition 6.3.9, if it can be shown that G
m
(θ) converges to G(θ) when m →∞
and the perturbation tends to zero as c
m
0. It is in this argument that the contraction
assumption is needed. One needs to evaluate differences
λ
m
λ
t
1
λ
m
λ
m
1
λ
t
λ
t
E
t
and E
t t
θ θ
λ
t
m
θ θ λ
t
θ θ
96 Handbook of Discrete-Valued Time Series
and using the regularity conditions of Fokianos and Tjøstheim (2012) these are transformed
to showing
E|λ
m
t
λ
t
|→0, E(λ
m
t
λ
t
)
2
0,
E|Y
t
m
Y
t
|→0, E(Y
t
m
Y
t
)
2
0,
and almost surely
|λ
m
t
λ
t
|→0, |Y
t
m
Y
t
|→0.
For a proof of these we refer to Fokianos and Tjøstheim (2012).
Similar techniques can be used in the log linear case, where
Y
t
= N
t
(λ
t
), ν
t
= d + aν
t1
+ b ln(Y
t1
+ 1), λ
t
= exp(ν
t
).
In this case there is a large discrepancy between the conditions required for approximating
the perturbed system to the unperturbed system and the conditions required for geometric
ergodicity of
Y
t
m
= N
t
(λ
t
), ν
m
t
= d + aν
m
t1
+ b ln(Y
t
m
1
+ 1) + ε
t,m
,
where ν
m
0
, Y
m
0
are xed and {ε
t,m
} is as in (4.19). For geometric ergodicity it is sufcient that
|a| < 1 and in addition if b > 0, then |a+b|< 1, and when b < 0, |a||a+b|< 1. These conditions
are much milder than the corresponding condition |a + b| < 1 for the linear system (4.4),
(4.5). This is because for the log linear system the process can ip back and forth between the
negative and positive domain. To obtain global contraction, where the perturbed likelihood
can be compared to the unperturbed one, much stricter conditions are required. In Fokianos
and Tjøstheim (2012), a set of sufcient conditions were obtained by requiring |a + b| < 1
if a and b have the same sign and (a
2
+b
2
)<1if a and b have different signs. In Douc et al.
(2013) who did not consider the asymptotic distribution but did consider contraction, this
is improved, and they obtained the sufcient condition |a + b|∨|a|∨|b| < 1.
The perturbation approach is also possible for non-Poisson processes. For the model
considered by Davis and Liu (2014), for example, one must show that the geometric
moment contractivity is enough to prove that the unperturbed likelihood can be approx-
imated by a perturbed likelihood. I conjecture that this is the case under weak regularity
conditions.
Finally, summing up all this, the perturbation approach and the direct unperturbed
approach are not mutually exclusive. The direct approach gives the fundamental proba-
bility properties of the process. Geometric ergodicity is not present, and, at least in a time
series sense, requires a nonstandard Markov chain theory. The perturbation approach is
only used in the inference stage. Asymptotic results are rst found for the perturbed sys-
tem. The geometric ergodicity means that the process is not required to be in its stationary
state, and in a time series sense more standard Markov theory can be used.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset