37 Markov Models for Count Time Series
(b) Under expectation thinning compounding and a GDSD marginal (with pgf
G
Y
(s)), the stationary time series model is (2.2), where the innovation
t
has pgf
G
Y
(s)/G
Y
(G
K
(s; α)).
Denition 2.2 (Self-generalized {K
K
(
(
α
α
)
)
]
}). Consider a family of K(α) ∼ F
K
(·; α) with
E[K(α)]=α and pgf G
K
(s; α) = E[s , α ∈[0, 1]. Then {F
K
(·; α)} is self-generalized iff
G
K
(G
K
(s; α); α
) = G
K
(s; αα
), ∀ α, α
∈ (0, 1).
For binomial thinning, the class of possible margins is called the discrete self-
decomposable (DSD) class. Note that unless Y is Poisson and {K(α)} corresponds to
binomial thinning, the distribution of the innovation is in a different parametric family
than F
Y
.
The terminology of self-generalizability is used in Zhu and Joe (2010b), and the concept
is called a semigroup operator in Van Harn and Steutel (1993). Zhu and Joe (2010a) show
that (1) Var[K(α)]=σ
2
= a
K
α(1 − α), where a
K
≥ 1 for a self-generalized family {K(α)}
K(α)
and (2) that generalized thinning operators without self-generalizability lack some closure
properties. Also self-generalizability is a nice property for embedding into a continuous-
time process.
For NB, Zhu and Joe (2010b) show that NB(θ, ξ) is GDSD for three self-generalizable
thinning operators that are given below. For NB(θ, ξ), with parametrization as given in
Section 2.2, the pgf is G
NB
(s; θ, ξ) =[π/{1 − (1 − π)s}]
θ
,for s > 0, θ > 0and ξ > 0.
Three types of thinning operators based on {K(α)} are given below in terms of the pgf,
together with Var[K(α)]; the second operator (I2) has been used by various authors in sev-
eral different parametrizations; the specication is simplest via pgfs. The different {K(α)}
families allow different degrees of conditional heteroscedasticity.
(I1) (binomial thinning) G
K
(s; α) = (1 − α) + αs,with Var[K(α)]=α(1 − α).
(I2) G
K
(s; α; γ) =
(1−α)+(α−γ)s
,0 ≤ γ ≤ 1, with Var[K(α)]=α(1−α)(1+γ)/(1−γ).
(1−αγ)−(1−α)γs
Note that γ = 0 implies G
K
(z; α) = (1 − α) + αs.
(I3) G
K
(s; α; γ) = γ
−1
[1 +γ −(1 +γ −γs)
α
],0 ≤ γ,withVar[K(α)]=α(1 −α)(1 +γ).
Note that γ → 0 implies G
K
(s; α) = (1 − α) + αs.
For NB(θ, ξ), GDSD with respect to I2(γ)holds for0 ≤ γ ≤ 1 − π = ξ/(1 + ξ),and
GDSD with respect to I3(γ)holdsfor0 ≤ γ ≤ (1 − π)/π = ξ. For GP(θ, η), the property
of DSD is shown in Zhu and Joe (2003), and it can be shown that GP(θ, η)isGDSDwith
respect to I2(γ(η)), where γ(η) increases as the overdispersion η increases. Note that the
GP distribution does not have a closed-form pgf.
2.3.4 Estimation
For parameter estimation in count time series models, a common estimation approach is
CLS. This involves the minimization of
n
2
(y
i
− E[Y
i
|y
i−1
, y
i−2
, ...])
2
for a time series of
i=
length n. For a stationary model, it is straightforward to get point estimators of μ
Y
and
some autocorrelation parameters. One problem with conditional least squares (CLS) is that
it cannot distinguish overdispersed Poisson models for
t
and Y
t
. For example, if a NB or
GP time series is assumed with one of the above generalized thinning operators, then the
overdispersion cannot be reliably estimated with an extra moment equation after CLS.