80 Handbook of Discrete-Valued Time Series
for some (usually nonnegative) functions {g
i
} and {h
i
}. So far, I do not know about any
systematic attempts to analyse models such as (4.8) except in the case where g
i
’s and h
i
’s
are linear functions and some special nonlinear models treated by Davis and Liu (2014).
Clearly there are many other possibilities, and in the course of this paper we will put
various restrictions on f .
The other category of specication has to do with the choice of distribution p in (4.1)
and the choice of parametrization of this distribution. The parametrization is not unique
and could have to do with the characterization of the exponential family of distributions,
where there are canonical choices of parameters. For example, instead of the intensity λ
t
as
a choice of parameter, one could choose the canonical parameter ν
t
= ln λ
t
in the Poisson
case. For a binary parameter process with probability parameter p
t
, one could choose the
parameter α
t
= ln p
t
/1 − p
t
. One obvious advantage of using ν
t
instead of λ
t
in the Poisson
linear case is that in an equation
ν
t
= d + aν
t−1
+ b ln(Y
t−1
+ 1)
corresponding to (4.5), it is not required any more that the parameters d, a, b be nonnegative,
since ν
t
itself can take negative values, and in a sense it is easier to implement explanatory
variables. We have treated such processes in Fokianos and Tjøstheim (2011).
We will be concerned with both the probabilistic structure of the system (4.1), (4.2) and
the asymptotic inference theory of parameter estimates of the parameters characterizing
the autoregressive process {X
t
}. Examples of parameters that have to be estimated are the
parameters a, b, d in the linear case (4.5).
Somewhat different techniques have been used to characterize the probabilistic struc-
ture, but common for most of them is the Markov chain theory. This is of course an integral
part of (nonlinear) AR processes (see Tjøstheim 1990), but it is made more difcult and
nonstandard in the present case due to the incompatibility problems of values of {X
t
} or
{λ
t
} as compared to {Y
t
}.
One technique for obtaining asymptotic results for the parameter estimates uses stan-
dard Markov chain theory by perturbing the original recursive relationship (4.2) or (4.5)
with a continuously distributed perturbation ε
t
, so that in the linear case one obtains the
perturbed equation
λ
t
= d + aλ
t−1
+ bY
t−1
+ ε
t
(4.9)
and then letting this perturbation tend to zero. This is perhaps not as direct approach as
one could wish for (cf. Doukhan 2012 for a critical view), but it leads relatively efciently to
results (Fokianos et al. 2009). A disadvantage of this approach is that one is not concerned
so much with the probabilistic structure of the processes ({X
t
}, {Y
t
}) themselves but rather
of the perturbed versions. To look at the probabilistic structure of (4.1), (4.2) and more
specically (4.4), (4.5), again there are different approaches. We will highlight all of this as
we proceed.
As mentioned, we will cover both the probabilistic structure and the theory of inference.
But the emphasis will be on the former because there are recent review papers, Fokianos
(2012), Tjøstheim (2012), and Fokianos (2015; Chapter 1 in this volume), with focus on
statistical inference and applications. We will start by a discussion of the existence of a sta-
tionary, that is invariant, probability measure for ({X
t
}, {Y
t
}). This problem is fundamental
for most of what follows, such as ergodicity, irreducibility, and recurrence and henceforth
for statistical inference. These topics are treated in Section 4.2. We draw the connection to