2
Markov Models for Count Time Series
Harry Joe
CONTENTS
2.1 Introduction.....................................................................................29
2.2 Models for CountData........................................................................30
2.3 Thinning Operators.. .. ........................................................................32
2.3.1 Analogues of Gaussian AR(p)........................................................34
2.3.2 Analogues of Gaussian Autoregressive Moving Average.. . . . ... . . . ... . . ... . . . .36
2.3.3 Classes of Generalized Thinning Operators. ......................................36
2.3.4 Estimation...............................................................................37
2.3.5 Incorporation of Covariates. .. ... .. ... .. ... .. ... .. ... .. ... .. ... .. ... .. ... .. ... .. ... ..38
2.4 Operators in Convolution-Closed Class.. ..................................................38
2.5 Copula-Based Transition......................................................................41
2.6 Statistical Inference and Model Comparisons.............................................44
References.............................................................................................47
2.1 Introduction
The focus of this chapter is on the construction of count time series models based on thin-
ning operators or a joint distribution on consecutive observations, and comparison of the
properties of the resulting models.
The models for count time series considered here are mainly intended for low counts
with the possibility of 0. If all counts are large and “farfrom 0, the models considered
here can be used as well as models that treat the count response as continuous.
Count data are often overdispersed relative to Poisson. There are many count regres-
sion models with covariates, examples are regression models with negative binomial (NB),
generalized Poisson (GP), zero-inated Poisson, zero-inated NB, etc.
If the count data are observed as a time series sequence, then the count regression model
can be adapted in two ways: (1) add previous observations as covariates and (2) make use
of some models for stationary count time series. Methodology for case (1) is covered in
Davis et al. (2000) and Fokianos (2011), and here, we discuss the quite different methodol-
ogy for case (2). The advantage of a time series regression model with univariate margins
corresponding to a count regression model is that predictions as a function of covariates can
be made with or without preceding observations. That is, this is useful if one is primarily
interested in regression but with time-dependent observations.
29
30 Handbook of Discrete-Valued Time Series
Common parametric models for count regression are NB and GP, and these include Pois-
son regression at the boundary. In this chapter, we use these two count regression models
for concreteness, but some approaches, such as copula-based models, can accommodate
other count distributions.
The remainder of the chapter is organized as follows. Section 2.2 summarizes some
count regression models and contrasts some properties of count time series models con-
structed under different approaches. Sections 2.3 through 2.5 provide some details for count
time series models based, respectively, on thinning operators, multivariate distributions
with random variables in a convolution-closed innitely divisible class, and copulas for
consecutive observations. Section 2.6 compares the ts of different models for one data set.
Some conventions and notation that are subsequently used are as follows: f is used for
probability mass functions (pmf) and F is used for cumulative distribution functions (cdfs)
with the subscript used to indicate the margin or random vector;
y
i=1
k
i
= 0 when y = 0;
N
0
is the set of nonnegative integers;
t
is used for the innovation at time t (that is,
t
is
independent of random variables at time t 1, t 2, ... in the stochastic representation).
2.2 Models for Count Data
In this section, we show how NB and GP distributions have been used for count regression
and count time series.
NB and GP regression models are nonunique in how regression coefcient βsareintro-
duced into the univariate parameters. If the mean is assumed to be loglinear in covariates,
there does not exist a unique model because the mean involves the convolution parameter
and a second parameter that links to overdispersion.
Brief details are summarized as follows, with F
NB
and F
GP
denoting the cdfs and f
NB
and
f
GP
denoting the pmfs:
1. (NB): θ convolution parameter, π probability parameter, ξ = π
1
1 0; mean
μ = θξ = θ(1 π)/π, variance σ
2
= μ(1 + ξ) = θ(1 π)/π
2
,and
(θ + y)
ξ
y
f
NB
(y; θ, ξ) = , y = 0, 1, 2, ..., θ > 0, ξ > 0.
(θ) y!
(1 + ξ)
θ+y
If θ →∞, ξ 0with θξ xed, the Poisson distribution is obtained.
2. (GP): θ convolution parameter, second parameter 0 η < 1; mean μ = θ/(1 η),
variance σ
2
= θ/(1 η)
3
,and
f
GP
(y; θ, η) =
θ(θ +
y
η
!
y)
y1
e
θηy
, y = 0, 1, 2, ..., θ > 0, 0 η < 1.
If η = 0, the Poisson distribution is obtained.
Cameron and Trivedi (1998) present the NBk(μ, γ) parametrization where θ = μ
2k
γ
1
and ξ = μ
k1
γ,1 k 2. For the NBk model, log μ = z
T
β depends on the covariate
vector z, and either or both θ and ξ are covariate dependent. For the NB1 parametrization:
31 Markov Models for Count Time Series
k = 1, θ = μγ
1
, ξ = γ;thatis, θ depends on covariates, and the dispersion index ξ = γ is
constant. For the NB2 parametrization: k = 2, θ = γ
1
, ξ = μγ and this is the same as in
Lawless (1987); that is, θ is constant and ξ is a function of the covariates, and the dispersion
index varies with the covariates. For 1 < k < 2, one could interpolate between these two
models using the NBk parametrization. Similarly, GP1 and GP2 regression models can be
dened.
Next, we consider stationary time series {Y
t
: t =1, 2, ...}, where the stationary distribu-
tion is NB, GP, or general F
Y
.
A Markov order 1 time series can be constructed based on a common joint distribution
F
12
for (Y
t1
, Y
t
) for all t with marginal cdfs F
1
= F
2
= F
Y
= F
NB
(·; θ, ξ) or F
GP
(·; θ, η) (or
another parametric univariate margin). Let f
12
and f
Y
be the corresponding bivariate and
univariate pmfs. The Markov order 1 transition probability is
Pr(Y
t
= y
new
|Y
t1
= y
prev
) =
f
12
(y
prev
, y
new
)
f
Y
(y
prev
)
A Markov order 2 time series can be constructed based on a common joint distribution
F
123
for (Y
t2
, Y
t1
, Y
t
) for all t with univariate marginal cdfs F
1
= F
2
= F
3
and bivariate
margins F
12
= F
23
. The ideas extend to higher-order Markov. However for count time series
with small counts, simpler models are generally adequate for forecasting.
There are two general approaches to obtain the transition probabilities; the main ideas
can be seen with Markov order 1.
1. Thinning operator for Markov order 1 dependence: Y
t
= R
t
(Y
t1
; α) +
t
(α),
0 α 1, where R
t
are independent realizations of a stochastic operator, the
t
are
appropriate innovation random variables, and typically E[R
t
(y; α)|Y
t1
= y]=αy
for y = 0, 1, ....
2. Copula-based transition probability from F
12
= C(F
Y
, F
Y
; δ) for a copula family C
with dependence parameter δ.
The review paper McKenzie (2003) has a section entitled “Markov chains” but copula-based
transition models were not included. Copulas are multivariate distributions with U(0, 1)
margins and they lead to exible modeling of multivariate data with the dependence struc-
ture separated from the univariate margins. References for use of copula models are Joe
(1997) and McNeil et al. (2005).
Some properties and contrasts are summarized below, with details given in subsequent
sections. Weiß (2008) has a survey of many thinning operators for count time series models,
and Fokianos (2012) has a survey of models based on thinning operators and conditional
Poisson. Some references where copulas are used for transition probabilities are Joe (1997)
(Chapter 8), Escarela et al. (2006), Biller (2009), and Beare (2010).
For thinning operators, the following hold:
The stationary margin is innitely divisible (such as NB, GP).
The serial correlations are positive.
The operator is generally interpretable and the conditional expectation is linear.
For extension to include covariates (and/or time trends), the ease depends on the
operator; covariates can enter into a parameter for the innovation distribution, but
in this way, the marginal distribution does not necessarily stay in the same family.
32 Handbook of Discrete-Valued Time Series
For extension to higher Markov orders, there are “integer autoregressive” models,
such as INAR(p) in Du and Li (1991) or GINAR(p) in Gauthier and Latour (1994),
and constructions as in Lawrance and Lewis (1980), Alzaid and Al-Osh (1990), and
Zhu and Joe (2006) to keep margins in a given family. Without negative serial cor-
relations, the range of autocorrelation functions is not as exible as the Gaussian
counterpart.
Numerical likelihood inference is simple if the transition probability has closed
form; for some operators, only the conditional probability generating function
(pgf) has a simple form and then the approach of Davies (1973) can be used to
invert the pgf. Conditional least squares (CLS) and moment methods can estimate
mean parameters but are not reliable for estimating the overdispersion parameter.
There are several different thinning operators for NB or GP and these can be
differentiated based on the conditional heteroscedasticity Var(Y
t
|Y
t1
= y).
Although thinning operators can be used in models that are analogues of Gaus-
sian AR(p), MA(q), and ARMA models, not as much is known about probabilistic
properties such as stationary distributions.
For copula-based transition, the following hold:
The stationary margin can be anything, and positive or negative serial dependence
can be attained by choosing appropriate copula families.
The conditional expectation is generally nonlinear and different patterns are pos-
sible. The tail behavior of the copula family affects the conditional expectation and
variance for large values.
It is easier to combine the time series model with covariates in a univariate count
regression model.
The extension from Markov order 1 to higher-order Markov is straightforward.
Theoretically, the class of autocorrelation functions is much wider than those based
on thinning operators.
Likelihood inference is easy if the copula family has a simple form.
The Gaussian copula is a special case; for example, autoregressive-to-anything
(ARTA) in Biller and Nelson (2005).
As a slight negative compared with thinning operators, for NB/GP, the copula
approach does not use any special univariate property.
2.3 Thinning Operators
This section has more detail on thinning operators. Operators are initially presented and
discussed without regard to stationarity and distributional issues. Notation similar to that
in Jung and Tremayne (2011) is used.
A general stochastic model is
Y
t
= R
t
(Y
t1
, Y
t2
, ...)+
t
,
33 Markov Models for Count Time Series
where
t
is the innovation random variable at time t and R
t
is a random variable that
depends on the previous observations. In order to get a stationary model with margin
F
Y
, the choice of distribution for {
t
} depends on the distribution of R
t
(Y
t1
, Y
t2
, ...).
If one is not aiming for a specic stationary distribution, there is no constraint on the
distribution of {
t
}.
If R
t
(Y
t1
, Y
t2
, ...)=R
t
(Y
t1
), then a Markov model of order 1 is obtained and if
R
t
(Y
t1
, Y
t2
, ...) = R
t
(Y
t1
, Y
t2
), then a Markov model of order 2 is obtained, etc.
For Markov order 1 models, the conditional pmf of [R(Y)|Y = y] is the same as the pmf
of R(y) and the unconditional pmf of R(Y) is
f
R(Y)
(x) = f
R(y)
(x) f
Y
(y).
y=0
The following classes of operators are included in the review in Weiß (2008):
Binomial thinning (Steutel and Van Harn 1979): R(y) = α y =
d
i
y
=1
I
i
(α) has
aBin(y, α) distribution, where I
1
(α), I
2
(α), ... are independent Bernoulli random
variables with mean α (0, 1). Hence E[R(y)]=αy and Var[R(y)]=α(1 α)y.
Expectation or generalized thinning (Latour, 1998; Zhu and Joe, 2010a): R(y) =
K(α) y
=
d
i
y
=1
K
i
(α) where K
1
(α), K
2
(α), ... are independent random variables
that are replicates of K(α) which has support on N
0
and satises E[K(α)]= α
[0, 1]. For the boundary case K(0) 0and K(1) 1. Hence E[R(y)]= αy and
Var[R(y)]=y Var[K(α)].
Random coefcient thinning (Zheng et al., 2006, 2007): R(y) = A y
=
d
y
i=1
I
i
(A)
where A has support on (0, 1) and given A
1
=
a, I
i
(a) are independent Bernoulli(a)
random variables. Hence Pr(A y = j) =
0
y
j
a
j
(1 a)
yj
dF
A
(a).If A has mean α,
then E[R(y)]=E[Ay]=αy and Var[R(y)]=E[A(1 A)y]+Var(Ay) = α(1 α)y +
y(y 1) Var(A).
Interpretations are provided later, with a subscript on the thinning operator to emphasize
that thinnings are performed at time t.
Time series based on binomial thinning:
Y
t1
Y
t
= α
t
Y
t1
+
t
= I
ti
(α) +
t
, (2.1)
i=1
where the I
ti
(α) are independent over t and i. It can be considered that α
t
Y
t1
consists of the “survivors” (continuing members) from time t 1totime t (with
each individual having a probability α of continuing), and
t
consists of the
“newcomers” (innovations) at time t.
Time series based on generalized thinning:
Y
t1
Y
t
= K(α)
t
Y
t1
+
t
= K
ti
(α) +
t
(2.2)
i=1
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset