39 Markov Models for Count Time Series
There is some theory that covers several count time series models when f
Y
is convolution-
closed and innitely divisible (CCID), because there is a way to construct a joint multivari-
ate distribution based on these properties and they lead to thinning operators. This theory
provides a bridge between the thinning operator approach of Section 2.3 and the general
Markov approach with copulas in Section 2.5.
The operators have been studied in specic discrete cases by McKenzie (1985, 1986,
1988), Al-Osh and Alzaid (1987), and Alzaid and Al-Osh (1993), and in a more general
framework in Joe (1996) and Jørgensen and Song (1998).
The general operator is presented rst for the Markov order 1 case and then it is men-
tioned how it can be extended to higher-order Markov or q-dependent, etc. series. Also it
will be mentioned how covariates can be accommodated. For this construction, Markov
order 1 implies linear conditional expectation but not Markov orders of 2 or higher.
Let {F(·; θ) : θ > 0} be a CCID parametric family such that F(·; θ
1
) F(·; θ
2
) = F(·; θ
1
+θ
2
),
where is the convolution operator; F(·;0) corresponds to the degenerate distribution at 0.
For X
j
F(·; θ
j
), j = 1, 2, with X
1
, X
2
independent, let H(·; θ
1
, θ
2
, y) be the distribution of
X
1
given that X
1
+ X
2
= y.Let R(·) = R(·; α, θ) (0 < α 1) be a random operator such that
R(Y) given Y = y has distribution H(·; αθ, (1α)θ, y),andR(Y) F(·; αθ) when Y F(·; θ).
A stationary time series with margin F(·; θ) and autocorrelation 0 < α < 1 (at lag 1) can
be constructed as
Y
t
= R
t
(Y
t1
) +
t
, R
t
(y
t1
) H(·; αθ, (1 α)θ, y
t1
), (2.11)
since F(·; θ) = F(·; θα) F(·; θ(1 α)), when the innovations
t
are independent and iden-
tically distributed with distribution F(·; (1 α)θ).Notethat {R
t
: t 1} are independent
replications of the operator R.
The intuitive reasoning is as follows. A consecutive pair (Y
t1
, Y
t
) has a common latent
or unobserved component X
12
through the stochastic representation:
Y
t1
= X
12
+ X
1
, Y
t
= X
12
+ X
2
,
where X
12
, X
1
, X
2
are independent random variables with distributions F(·; αθ), F(·; (1
α)θ), F(·; (1α)θ), respectively. The operator R
t
(Y
t1
) “recovers” the unobserved common
component X
12
; hence the distribution of R
t
(y) given Y
t1
= y must be the same as the
distribution of X
12
given X
12
+ X
1
= y.
Examples of CCID operations for the innite divisible distributions of Poisson, NB and
GP are given below.
1. If F(·; θ) is Po(θ), then H(·; αθ, (1 α)θ, y) is Bin(y, α). The resulting operator is
binomial thinning.
2. If F(·; θ) = F
NB
(·; θ, ξ) with xed ξ > 0, then H(·; αθ, (1 α)θ, y) or Pr(X
1
= x |
X
1
+ X
2
= y) with X
j
independently NB(θ
j
, ξ), is Beta-binomial(y, αθ, (1 α)θ)
independent of ξ. The pmf of H is
y B(θ
1
+ x, θ
2
+ y x)
h(x; θ
1
, θ
2
, y) =
, x = 0, 1, ..., y,
x
B(θ
1
, θ
2
)
40 Handbook of Discrete-Valued Time Series
The operator matches the random coefcient thinning in Section 2.3, but not bino-
mial thinning or generalized thinning. This rst appeared in McKenzie (1986). For
(2.11) based on this operator E[Y
t
|Y
t1
= y]=αy + (1 α)θξ,and
(1 α)θξ(1 + ξ) + y(θ + y)α(1 α)
Var(Y
t
|Y
t1
= y) =
(θ + 1)
.
The conditional variance is quadratically increasing in y for large y, and hence this
process has more conditional heteroscedasticity than those based on compounding
operators in Section 2.3.
3. If F(·; θ) = F
GP
(·; θ, η) with 0 < η < 1 xed, then H(·; αθ, (1 α)θ, y) or Pr(X
1
=
x | X
1
+ X
2
= y) with X
j
independently GP(θ
j
, η) is a quasi-binomial distribution
with parameters π = θ
1
/(θ
1
+ θ
2
), ζ = η/(θ
1
+ θ
2
). The quasi-binomial pmf is:
y
π(1 π)
π + ζx
x1
1 π + ζ(y x)
yx1
h(x; π, ζ, y) =
,
x
1 + ζy 1 + ζy 1 + ζy
for x = 0, 1, ... , y. For (2.11) with this operator, E[Y
t
|Y
t1
=y]=αy +
(1 α)θ/(1 η),
Var[Y
t
|Y
t1
= y]=α(1 α)
y
2
y2
y!ζ
j
(y j 2)!(1 + yζ)
j+1
j=0
(1 α)θ
η
+
ζ =
;
(1 η)
3
,
θ
see Alzaid and Al-Osh (1993). Numerically this is superlinear and asymptotically
O(y
2
) as y →∞.
These operators can be used for INMA(q) and INARMA(1, q) models in an analogous
manner to the models in (2.8) and (2.9).
Next, we present the Markov order 2 extension of Joe (1996) and Jung and Tremayne
(2011). Consider the following model for three consecutive observations
Y
t2
= X
123
+ X
12
+ X
13
+ X
1
Y
t1
= X
123
+ X
12
+ X
23
+ X
2
,
(2.12)
Y
t
= X
123
+ X
23
+ X
13
+ X
3
,
where X
1
, X
2
, X
3
, X
12
, X
13
, X
23
, X
123
have distributions in the family F(·; θ) with respective
parameters θ
1
= θ θ
0
θ
1
θ
2
, θ
2
= θ θ
0
2θ
1
, θ
1
, θ
1
, θ
2
, θ
1
, θ
0
(θ is dened so
that θ
1
, θ
2
are nonnegative). The conditional probability Pr(Y
t
= y
new
|Y
t1
= y
prev1
, Y
t2
=
y
prev2
) does not lead to a simple operator for Markov order 1, so that computationally one
can just use
Pr(Y
t2
= w
1
, Y
t1
= w
2
, Y
t
= w
3
)
Pr(Y
t
= w
3
|Y
t1
= w
2
, Y
t2
= w
1
) =
Pr(Y
t2
= w
1
, Y
t1
= w
2
)
.
41 Markov Models for Count Time Series
The numerator involves a quadruple sum:
w
1
w
2
w
3
(w
1
x
123
)(w
2
x
123
)
(w
3
x
123
)(w
2
x
123
x
12
)
(w
1
x
123
x
12
)(w
3
x
123
x
23
)
x
123
=0 x
12
=0 x
23
=0 x
13
=0
f (x
123
, θ
0
)f (x
12
; θ
1
)f (x
23
; θ
1
)f (x
13
; θ
2
)f (w
1
x
123
x
12
x
13
; θ
1
)
· f (w
2
x
123
x
12
x
23
; θ
2
)f (w
3
x
123
x
23
x
13
; θ
1
).
For a model simplication, let θ
13
= 0sothat X
13
= 0; then the numerator becomes a triple
sum. Letting X
13
= 0 is sufcient to get a one-parameter extension of Markov order 1; this
Markov order 2 model becomes the Markov order 1 model when θ
0
= α
2
θ, θ
1
= α(1 α)θ,
θ
1
= (1 α)θ, θ
2
= (1 α)
2
θ.
When θ
13
= 0, and α
1
α
2
0 are the autocorrelations at lags 1 and 2, the time series
model has a stochastic representation:
Y
t
= R
t
(Y
t1
, Y
t2
) +
t
,
t
F(·; (1 α
1
)θ), (2.13)
where via (2.12), R
t
(y
t1
, y
t2
) has the the conditional distribution of X
123
+X
12
+X
23
given
Y
t1
= y
t1
, Y
t2
= y
t2
and the convolution parameters of X
123
, X
12
, X
1
, X
2
are, respec-
tively, α
2
θ, (α
1
α
2
)θ, (1 α
1
)θ, (1 2α
1
+ α
2
)θ with α
2
2α
1
1. If θ
13
> 0, then the
convolution parameters of X
123
, X
12
, X
13
, X
1
, X
2
are, respectively, θ
0
, θ
1
, θ
2
, θ
1
, θ
2
with
α
1
= (θ
1
+ θ
0
)/θ, α
2
= (θ
2
+ θ
0
)/θ.
The pattern extends to higher-order Markov but numerically the transition probability
becomes too cumbersome because the most general p-dimensional distribution of this type
involves 2
p
1 independent X
S
for S being a nonempty subset of {1, ..., p}. As mentioned
by Jung and Tremayne (2011), the autocorrelation structure of this Markov model for p 2
with Poisson, NB, or GP margins does not mimic the Gaussian counterpart, because of a
nonlinear conditional mean function.
Because the distribution of the innovation is in the same family as the stationary marginal
distribution, the models can be extended easily so that the convolution parameter of Y
t
is
θ
t
, which depends on time-varying covariates. For example, for the Markov order 1 model,
with Y
t
F(·; θ
t
), R
t
(Y
t1
) F(·; αθ
t1
) and
t
F(·; ζ
t
) with ζ
t
= θ
t
αθ
t1
0 (Joe 1997,
Section 8.4.4). For NB and GP, this means the univariate regression models are NB1 and
GP1, respectively.
2.5 Copula-Based Transition
The copula modeling approach is a way to get a joint distribution for (Y
tp
, ..., Y
t
) with-
out an assumption of innite divisibility. Hence univariate margins can be any distribution
in the stationary case. However, the property of linear conditional expectation for the
Markov order 1 process will be lost. For a (p + 1)-variate copula C
1:(p+1)
, then F
1:(p+1)
=
C
1:(p+1)
(F
Y
, ..., F
Y
) is a model for the multivariate discrete distribution of (Y
tp
, ..., Y
t
).
For stationarity, marginal copulas satisfy C
1:m
= C
(1+i):(m+i)
for i = 1, ..., p + 1 m and
m = 2, ..., p. The resulting transition probability Pr(Y
t
= y
t
|Y
tp
= y
tp
, ..., Y
t1
= y
t1
)
can be computed from F
1:(p+1)
.If Y were a continuous random variable, there is a simple
42 Handbook of Discrete-Valued Time Series
stochastic representation for the copula-based Markov model in terms of U(0, 1) random
variables, but this is not the case for Y discrete.
If there are time-varying covariates z
t
so that F
Y
t
= F(·; β, z
t
), then one can use F
1:(p+1)
=
C
1:(p+1)
(F
Y
tp
, ... , F
Y
t
) for the distribution of (Y
tp
, ... , Y
t
) with Markov dependence and a
time-varying parameter in the univariate margin.
For q-dependence, one can get a time series model {F
Y
1
(U
t
)} with stationary margin
F
Y
if {U
t
} is a q-dependent sequence of U(0, 1) random variables. For mixed Markov/
q-dependent, a copula model that combines features of Markov and q-dependence can be
dened. Chapter 8 of Joe (1997) has the copula time series models for Markov dependence
and 1-dependence.
More specic details of parametric models are given for Markov order 1, followed by
brief mention of higher-order Markov, q-dependent and mixed Markov/q-dependent.
For a stationary time series model, with stationary univariate distribution F
Y
,let F
12
=
C(F
Y
, F
Y
; δ) be the distribution of (Y
t1
, Y
t
) where C is a bivariate copula family with
dependence parameter δ. Then the transition probability Pr(Y
t
= y
t
|Y
t1
= y
t1
) is
F
12
(y
t1
, y
t
) F
12
(y
t1
, y
t
) F
12
(y
t1
, y
t
) + F
12
(y
t1
, y
t
)
f
2|1
(y
t
|y
t1
) =
f
Y
(y
t1
)
,
where y
i
is shorthand for y
i
1for i = t 1and t.
Below are a few examples of one-parameter copula models that include independence,
perfect positive dependence, and possibly an extension to negative dependence. Different
tail behavior of the copula leads to different asymptotic tail behavior of the conditional
expectation and variance, but the conditional expectation is roughly linear in the middle.
If a copula C is the distribution of a bivariate uniform vector (U
1
, U
2
), then the distribution
of the reection (1 U
1
,1 U
2
) is C(u
1
, u
2
) := u
1
+ u
2
1 + C(1 u
1
,1 u
2
). The copula
C is reection symmetric if C =
C. Otherwise for a reection asymmetric bivariate copula
C, one can also consider C as a model with the opposite direction of tail asymmetry.
The bivariate Gaussian copula can be considered as a baseline model from which other
copula families deviate from in tail behavior. Based on Jeffreys’ and Kullback–Leibler diver-
gences of Y
1
, Y
2
that are NB or GP, the bivariate distribution F
12
from the binomial thinning
operator or the beta/quasi-binomial operators are very similar, with typically a sample
size of over 500 needed to distinguish the models when the (lag 1) correlation is moderate
(0.4–0.7).
Below is a summary of bivariate copula families with different tail properties and
hence different tail behavior of the conditional mean E(Y
t
|Y
t1
=y) and variance
Var(Y
t
|Y
t1
=y) as y →∞, when F
Y
= F
NB
or F
GP
.
1. Bivariate Gaussian: reection symmetric, with ,
2
being the univariate
and bivariate Gaussian cdf with mean 0 and variance 1, C(u
1
, u
2
; ρ) =
2
(
1
(u
1
),
1
(u
2
); ρ), 1 < ρ < 1. The conditional mean is asymptotically
slightly sublinear and the conditional variance is asymptotically close to linear.
2. Bivariate Frank: reection symmetric, C(u
1
, u
2
; δ) =−δ
1
log[1 (1 e
δu
1
)(1
e
δu
2
)/(1 e
δ
)], −∞ < δ < . Because the upper tail behaves like 1 u
1
u
2
+
C(u
1
, u
2
) ζ(1 u
1
)(1 u
2
) for some ζ > 0as u
1
, u
2
1
, the conditional mean
and variance are asymptotically at.
43 Markov Models for Count Time Series
3. Bivariate Gumbel: reection asymmetric with stronger dependence in the joint
upper tail. C(u
1
, u
2
; δ) = exp{−[( log u
1
)
δ
+ ( log u
2
)
δ
]
1/δ
} for δ 1. The condi-
tional mean is asymptotically linear and the conditional variance is asymptotically
sublinear.
4. Reected or survival Gumbel: reection asymmetric with stronger dependence in the
joint lower tail. C(u
1
, u
2
; δ) = u
1
+ u
2
1 + exp{−[( log{1 u
1
})
δ
+ ( log{1
u
2
})
δ
]
1/δ
} for δ 1. The conditional mean and variances are asymptotically
sublinear.
The Gumbel or reected Gumbel copula can be recommended when there is some tail
asymmetry relative to Gaussian. The Gumbel copula can be recommended when it is
expected that there is some clustering of large values (exceeding a large threshold).
The Frank copula is the simple copula that is reection symmetric and can allow nega-
tive dependence. However its bivariate joint upper and lower tails are lighter than the
Gaussian’s copula, and this has implication that the conditional expectation E(Y
t
|Y
t1
= y)
converges to a constant for large y. For Gaussian, Gumbel, and reected Gumbel,
E(Y
t
|Y
t1
= y) is asymptotically linear or sublinear for large y for {Y
t
} with a stationary
distribution that is exponentially decreasing (like NB and GP). Some of these results can be
proved with the techniques in Hua and Joe (2013).
For second order Markov chains, one just needs a trivariate copula that satises a good
choice is the trivariate Gaussian copula with lag 1 and lag 2 latent correlations being ρ
1
, ρ
2
respectively. If closed-form copula functions are desired, a class to consider has form
C
ψ,H
(u
1
, u
2
, u
3
) = ψ
log H e
0.5ψ
1
(u
j
)
, e
0.5ψ
1
(u
2
)
+
1
2
ψ
1
(u
j
)
,
j∈{1,3}
where ψ is the Laplace transform of a positive random variable, H is a bivariate permuta-
tion symmetric max-innite divisible copula; it has bivariate margins:
C
j2
= ψ log H e
0.5ψ
1
(u
j
)
, e
0.5ψ
1
(u
2
)
+
1
2
ψ
1
(u
j
) +
1
2
ψ
1
(u
2
) , j = 1, 3,
and C
13
(u
1
, u
3
) = ψ(ψ
1
(u
1
) + ψ
1
(u
3
)).This C
ψ,H
is a suitable copula, with closed-form
cdf, for the transition of a stationary time series of Markov order 2, when there is more
dependence for measurements at nearer time points. If a model with clustering of large
values is desired, then one can take H to be the bivariate Gumbel copula and ψ to be
the positive stable Laplace transform, and then C
ψ,H
is a trivariate extreme value copula.
Other simple choices used for the data set in Section 2.6 are the Frank copula for H together
with the positive stable or logarithmic series Laplace transform for ψ.
Both the Gaussian copula and C
ψ,H
can be extended to AR(p). Other alternatives for
copulas for Markov order p 2 are based on discrete D-vines (see Panagiotelis et al. 2012).
For copula versions of Gaussian MA(q) series, analogues of (2.8) can be constructed for
dependent U(0, 1) sequences which can then be converted with the inverse probability
transform F
Y
1
. For a q-dependent sequence, a (q + 1)-variate copula K
1:(q+1)
is needed with
margin K
1:q
being an independence copula. Then K
1:(q+1)
is the copula of (
tq
, ...,
t1
, U
t
),
where {
t
} is a sequence of independent U(0, 1) innovation random variables, and
U
t
= K
q
+
1
1|1:q
(
t
|
tq
, ...,
t1
).HereK
q+1|1:q
is the conditional distribution of variable q + 1
given variables 1, ..., q.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset