2: Markov Models for Count Time Series (1/5)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Markov Models for Count Time Series

Harry Joe

CONTENTS

2.1 Introduction.....................................................................................29

2.2 Models for CountData........................................................................30

2.3 Thinning Operators.. .. ........................................................................32

2.3.1 Analogues of Gaussian AR(p)........................................................34

2.3.2 Analogues of Gaussian Autoregressive Moving Average.. . . . ... . . . ... . . ... . . . .36

2.3.3 Classes of Generalized Thinning Operators. ......................................36

2.3.4 Estimation...............................................................................37

2.3.5 Incorporation of Covariates. .. ... .. ... .. ... .. ... .. ... .. ... .. ... .. ... .. ... .. ... .. ... ..38

2.4 Operators in Convolution-Closed Class.. ..................................................38

2.5 Copula-Based Transition......................................................................41

2.6 Statistical Inference and Model Comparisons.............................................44

References.............................................................................................47

2.1 Introduction

The focus of this chapter is on the construction of count time series models based on thin-

ning operators or a joint distribution on consecutive observations, and comparison of the

properties of the resulting models.

The models for count time series considered here are mainly intended for low counts

with the possibility of 0. If all counts are large and “far” from 0, the models considered

here can be used as well as models that treat the count response as continuous.

Count data are often overdispersed relative to Poisson. There are many count regres-

sion models with covariates, examples are regression models with negative binomial (NB),

generalized Poisson (GP), zero-inated Poisson, zero-inated NB, etc.

If the count data are observed as a time series sequence, then the count regression model

can be adapted in two ways: (1) add previous observations as covariates and (2) make use

of some models for stationary count time series. Methodology for case (1) is covered in

Davis et al. (2000) and Fokianos (2011), and here, we discuss the quite different methodol-

ogy for case (2). The advantage of a time series regression model with univariate margins

corresponding to a count regression model is that predictions as a function of covariates can

be made with or without preceding observations. That is, this is useful if one is primarily

interested in regression but with time-dependent observations.

30 Handbook of Discrete-Valued Time Series

Common parametric models for count regression are NB and GP, and these include Pois-

son regression at the boundary. In this chapter, we use these two count regression models

for concreteness, but some approaches, such as copula-based models, can accommodate

other count distributions.

The remainder of the chapter is organized as follows. Section 2.2 summarizes some

count regression models and contrasts some properties of count time series models con-

structed under different approaches. Sections 2.3 through 2.5 provide some details for count

time series models based, respectively, on thinning operators, multivariate distributions

with random variables in a convolution-closed innitely divisible class, and copulas for

consecutive observations. Section 2.6 compares the ts of different models for one data set.

Some conventions and notation that are subsequently used are as follows: f is used for

probability mass functions (pmf) and F is used for cumulative distribution functions (cdfs)

with the subscript used to indicate the margin or random vector;



i=1

= 0 when y = 0;

is the set of nonnegative integers; 

is used for the innovation at time t (that is, 

independent of random variables at time t − 1, t − 2, ... in the stochastic representation).

2.2 Models for Count Data

In this section, we show how NB and GP distributions have been used for count regression

and count time series.

NB and GP regression models are nonunique in how regression coefcient βsareintro-

duced into the univariate parameters. If the mean is assumed to be loglinear in covariates,

there does not exist a unique model because the mean involves the convolution parameter

and a second parameter that links to overdispersion.

Brief details are summarized as follows, with F

and F

denoting the cdfs and f

and

denoting the pmfs:

1. (NB): θ convolution parameter, π probability parameter, ξ = π

−1

− 1 ≥ 0; mean

μ = θξ = θ(1 − π)/π, variance σ

= μ(1 + ξ) = θ(1 − π)/π

,and

(θ + y)

(y; θ, ξ) = , y = 0, 1, 2, ..., θ > 0, ξ > 0.

(θ) y!

(1 + ξ)

θ+y

If θ →∞, ξ → 0with θξ xed, the Poisson distribution is obtained.

2. (GP): θ convolution parameter, second parameter 0 ≤ η < 1; mean μ = θ/(1 − η),

variance σ

= θ/(1 − η)

,and

(y; θ, η) =

θ(θ +

y−1

−θ−ηy

, y = 0, 1, 2, ..., θ > 0, 0 ≤ η < 1.

If η = 0, the Poisson distribution is obtained.

Cameron and Trivedi (1998) present the NBk(μ, γ) parametrization where θ = μ

2−k

−1

and ξ = μ

k−1

γ,1 ≤ k ≤ 2. For the NBk model, log μ = z

β depends on the covariate

vector z, and either or both θ and ξ are covariate dependent. For the NB1 parametrization:

31 Markov Models for Count Time Series

k = 1, θ = μγ

−1

, ξ = γ;thatis, θ depends on covariates, and the dispersion index ξ = γ is

constant. For the NB2 parametrization: k = 2, θ = γ

−1

, ξ = μγ and this is the same as in

Lawless (1987); that is, θ is constant and ξ is a function of the covariates, and the dispersion

index varies with the covariates. For 1 < k < 2, one could interpolate between these two

models using the NBk parametrization. Similarly, GP1 and GP2 regression models can be

dened.

Next, we consider stationary time series {Y

: t =1, 2, ...}, where the stationary distribu-

tion is NB, GP, or general F

A Markov order 1 time series can be constructed based on a common joint distribution

for (Y

t−1

, Y

) for all t with marginal cdfs F

= F

(·; θ, ξ) or F

(·; θ, η) (or

another parametric univariate margin). Let f

and f

be the corresponding bivariate and

univariate pmfs. The Markov order 1 transition probability is

Pr(Y

= y

new

t−1

= y

) =

, y

new

)

A Markov order 2 time series can be constructed based on a common joint distribution

123

for (Y

t−2

, Y

t−1

, Y

) for all t with univariate marginal cdfs F

= F

and bivariate

margins F

= F

. The ideas extend to higher-order Markov. However for count time series

with small counts, simpler models are generally adequate for forecasting.

There are two general approaches to obtain the transition probabilities; the main ideas

can be seen with Markov order 1.

1. Thinning operator for Markov order 1 dependence: Y

= R

t−1

; α) + 

(α),

0 ≤ α ≤ 1, where R

are independent realizations of a stochastic operator, the 

are

appropriate innovation random variables, and typically E[R

(y; α)|Y

t−1

= y]=αy

for y = 0, 1, ....

2. Copula-based transition probability from F

= C(F

, F

; δ) for a copula family C

with dependence parameter δ.

The review paper McKenzie (2003) has a section entitled “Markov chains” but copula-based

transition models were not included. Copulas are multivariate distributions with U(0, 1)

margins and they lead to exible modeling of multivariate data with the dependence struc-

ture separated from the univariate margins. References for use of copula models are Joe

(1997) and McNeil et al. (2005).

Some properties and contrasts are summarized below, with details given in subsequent

sections. Weiß (2008) has a survey of many thinning operators for count time series models,

and Fokianos (2012) has a survey of models based on thinning operators and conditional

Poisson. Some references where copulas are used for transition probabilities are Joe (1997)

(Chapter 8), Escarela et al. (2006), Biller (2009), and Beare (2010).

For thinning operators, the following hold:

• The stationary margin is innitely divisible (such as NB, GP).

• The serial correlations are positive.

• The operator is generally interpretable and the conditional expectation is linear.

• For extension to include covariates (and/or time trends), the ease depends on the

operator; covariates can enter into a parameter for the innovation distribution, but

in this way, the marginal distribution does not necessarily stay in the same family.

32 Handbook of Discrete-Valued Time Series

• For extension to higher Markov orders, there are “integer autoregressive” models,

such as INAR(p) in Du and Li (1991) or GINAR(p) in Gauthier and Latour (1994),

and constructions as in Lawrance and Lewis (1980), Alzaid and Al-Osh (1990), and

Zhu and Joe (2006) to keep margins in a given family. Without negative serial cor-

relations, the range of autocorrelation functions is not as exible as the Gaussian

counterpart.

• Numerical likelihood inference is simple if the transition probability has closed

form; for some operators, only the conditional probability generating function

(pgf) has a simple form and then the approach of Davies (1973) can be used to

invert the pgf. Conditional least squares (CLS) and moment methods can estimate

mean parameters but are not reliable for estimating the overdispersion parameter.

• There are several different thinning operators for NB or GP and these can be

differentiated based on the conditional heteroscedasticity Var(Y

t−1

= y).

• Although thinning operators can be used in models that are analogues of Gaus-

sian AR(p), MA(q), and ARMA models, not as much is known about probabilistic

properties such as stationary distributions.

For copula-based transition, the following hold:

• The stationary margin can be anything, and positive or negative serial dependence

can be attained by choosing appropriate copula families.

• The conditional expectation is generally nonlinear and different patterns are pos-

sible. The tail behavior of the copula family affects the conditional expectation and

variance for large values.

• It is easier to combine the time series model with covariates in a univariate count

regression model.

• The extension from Markov order 1 to higher-order Markov is straightforward.

• Theoretically, the class of autocorrelation functions is much wider than those based

on thinning operators.

• Likelihood inference is easy if the copula family has a simple form.

• The Gaussian copula is a special case; for example, autoregressive-to-anything

(ARTA) in Biller and Nelson (2005).

• As a slight negative compared with thinning operators, for NB/GP, the copula

approach does not use any special univariate property.

2.3 Thinning Operators

This section has more detail on thinning operators. Operators are initially presented and

discussed without regard to stationarity and distributional issues. Notation similar to that

in Jung and Tremayne (2011) is used.

A general stochastic model is

= R

t−1

, Y

t−2

, ...)+ 



33 Markov Models for Count Time Series

where 

is the innovation random variable at time t and R

is a random variable that

depends on the previous observations. In order to get a stationary model with margin

, the choice of distribution for {

} depends on the distribution of R

t−1

, Y

t−2

, ...).

If one is not aiming for a specic stationary distribution, there is no constraint on the

distribution of {

If R

t−1

, Y

t−2

, ...)=R

t−1

), then a Markov model of order 1 is obtained and if

t−1

, Y

t−2

, ...) = R

t−1

, Y

t−2

), then a Markov model of order 2 is obtained, etc.

For Markov order 1 models, the conditional pmf of [R(Y)|Y = y] is the same as the pmf

of R(y) and the unconditional pmf of R(Y) is

∞

R(Y)

(x) = f

R(y)

(x) f

(y).

y=0

The following classes of operators are included in the review in Weiß (2008):

• Binomial thinning (Steutel and Van Harn 1979): R(y) = α ◦ y =



(α) has

aBin(y, α) distribution, where I

(α), I

(α), ... are independent Bernoulli random

variables with mean α ∈ (0, 1). Hence E[R(y)]=αy and Var[R(y)]=α(1 − α)y.

• Expectation or generalized thinning (Latour, 1998; Zhu and Joe, 2010a): R(y) =

K(α) � y



(α) where K

(α), K

(α), ... are independent random variables

that are replicates of K(α) which has support on N

and satises E[K(α)]= α ∈

[0, 1]. For the boundary case K(0) ≡ 0and K(1) ≡ 1. Hence E[R(y)]= αy and

Var[R(y)]=y Var[K(α)].

• Random coefcient thinning (Zheng et al., 2006, 2007): R(y) = A ◦ y



i=1

(A)

where A has support on (0, 1) and given A

�



a, I

(a) are independent Bernoulli(a)

random variables. Hence Pr(A ◦ y = j) =

(1 − a)

y−j

(a).If A has mean α,

then E[R(y)]=E[Ay]=αy and Var[R(y)]=E[A(1 −A)y]+Var(Ay) = α(1 −α)y +

y(y − 1) Var(A).

Interpretations are provided later, with a subscript on the thinning operator to emphasize

that thinnings are performed at time t.

• Time series based on binomial thinning:

t−1

= α ◦

t−1

+ 

= I

(α) + 

, (2.1)

i=1

where the I

(α) are independent over t and i. It can be considered that α ◦

t−1

consists of the “survivors” (continuing members) from time t − 1totime t (with

each individual having a probability α of continuing), and 

consists of the

“newcomers” (innovations) at time t.

• Time series based on generalized thinning:

t−1

= K(α) �

t−1

+ 

= K

(α) + 

(2.2)

i=1

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 2: Markov Models for Count Time Series (1/5)

Create new playlist

Sign In

Sign Up

Table of Contents for
2: Markov Models for Count Time Series (1/5)