412 Handbook of Discrete-Valued Time Series
Note that by assuming any distribution other than Bernoulli for Z
i
s in (19.2), we get a
generalized Steutel and van Harn operator. Other operators can also be similarly dened.
A review on thinning operators can be found in Weiß (2008).
The model can be generalized to have p terms (i.e., INAR(p)), but there is no unique way
to do this. Moving average (MA) terms can be added to the model leading to INARMA
models. Covariates can also be introduced to model the mean of the innovation term to
allow measuring the effect of additional information leading to INAR regression models.
We next present extensions to the multivariate case by rst extending the thinning operator
to a matrix-valued form and then presenting the multivariate INAR model.
Let A be an r × r matrix with elements α
ij
, i, j = 1, ..., r and Y be a nonnegative
integer-valued r-dimensional vector. The matrix-valued operator is dened as
r
α
1j
Y
j
j = 1
A Y =
.
.
.
.
r
α
rj
Y
j
j = 1
The univariate operations α X and β Y are independent if and only if the counting
processes in their denitions are independent. Hence, the matrix-valued operator implies
independence between the univariate operators. Properties of this operator can be found
in Latour (1997).
Using this operator, Latour (1997) dened a multivariate generalized INAR process of
order p (MGINAR(p)) by assuming that
p
Y
t
= A
j
Y
tj
+
t
,
j = 1
where Y
t
and
t
are r-vectors and A
j
, j = 1, ..., p are r × r matrices and gave conditions
for existence and stationarity. A more focused presentation of the model follows.
19.3.2 Multivariate INAR Model
Let Y and R be nonnegative integer-valued random r-vectors and let A be an r × r matrix
with elements {α
ij
}
i,j = 1,...,r
. The MINAR(1) process can be dened as
α
11
α
12
... α
1r
Y
1,t1
R
1t
Y
t
= A Y
t1
+ R
t
=
α
21
.
.
.
α
22
...
.
.
.
α
2r
.
.
.
.
.
.
.
.
.
+
.
.
.
.
.
.
, t = 1, 2 ... (19.3)
α
r1
α
r2
... α
rr
Y
r,t1
R
rt
The vector of innovations R
t
follows an r-variate discrete distribution, which characterizes
the marginal distribution of the Y
t
as well. More on this will follow. When A is diagonal,
we will call the model diagonal MINAR. Clearly, this has less structure.
413 Models for Multivariate Count Time Series
The nonnegative integer-valued random process {Y
t
}
tZ
is the unique strictly stationary
solution of (19.3), if the largest eigenvalue of the matrix A is less than 1 and E
R
t
<
(see also Franke and Rao, 1995; Latour, 1997).
To help the exposition consider the case with r = 2. The two series can be written as
Y
1t
= α
11
Y
1,t1
+ α
12
Y
2,t1
+ R
1t
,
Y
2t
= α
22
Y
2,t1
+ α
21
Y
1,t1
+ R
2t
.
This helps to understand the dynamics. The cross correlation between the two series
comes from sharing common elements as well as from the joint distribution of (R
1t
, R
2t
).
If A is a diagonal matrix in this bivariate example, so that α
12
= α
21
= 0, then
the two series are univariate INAR models but are still correlated due to the joint pmf
of (R
1t
, R
2t
).
Taking expectations on both sides of (19.3), it is straightforward to obtain
μ = E(Y
t
) =[I A]
1
E(R
t
). (19.4)
The variance–covariance matrix γ(0) = E [(Y
t
μ)(Y
t
μ)
] satises a difference equation
of the form
γ(0) = Aγ(0)A
+ diag(Bμ) + Var(R
t
). (19.5)
The innovation series R
t
consists of identically distributed sequences {R
it
}
r
and has mean
i=1
E(R
t
) = λ = (λ
1
, ..., λ
r
) and variance
υ
1
λ
1
φ
12
... φ
1r
φ
12
υ
2
λ
2
... φ
2r
Var(R
t
) =
.
.
.
.
,
.
.
.
.
.
.
.
.
φ
1r
φ
2r
... υ
r
λ
r
where υ
i
> 0, i = 1, ..., r. Depending on the value of the parameter υ
i
, the assumptions of
equidispersion (υ
i
= 1), overdispersion (υ
i
> 1), and underdispersion (υ
i
(0, 1)) can be
obtained.
In the bivariate case, that is, when r = 2, it can be proved that the vector of expectations
(19.5) has elements
(1 α
22
)λ
1
+ α
12
λ
2
µ
1
= ,
(1 α
11
)(1 α
22
) α
12
α
21
(1 α
11
)λ
2
+ α
21
λ
1
µ
2
= ,
(1 α
11
)(1 α
22
) α
12
α
21
414 Handbook of Discrete-Valued Time Series
while the elements of γ(0) are
γ
11
(0) = Var(Y
1t
)
=
(1
1
α
2
α
12
2
Var(Y
2t
) + 2α
11
α
12
Cov(Y
1t
, Y
2t
)
11
)
+ α
11
(1 α
11
)µ
1
+ α
12
(1 α
12
)µ
2
+ υ
1
λ
1
,
γ
22
(0) = Var(Y
2t
)
=
1
α
2
(1 α
2
21
Var(Y
1t
) + 2α
22
α
21
Cov(Y
1t
, Y
2t
)
22
)
+ α
22
(1 α
22
)µ
2
+ α
21
(1 α
21
)µ
1
+ υ
2
λ
2
,
γ
12
(0) = γ
21
(0) = Cov(Y
1t
, Y
2t
)
α
11
α
21
Var(Y
1t
) + α
22
α
12
Var(Y
2t
) + φ
=
,
1 α
11
α
22
α
12
α
21
where φ is the covariance between the innovations.
Note that Cov(Y
it
, R
jt
) = Cov(R
it
, R
jt
), i, j = 1, ..., r, i = j (Pedeli and Karlis, 2011).
That is, the covariance between the current value of one process and the innovations of the
other process at time t is equal to the covariance of the innovations of the two series at the
same time t.
Regarding the covariance function γ(h) =E [(Y
t+h
μ)(Y
t
μ)
] for h > 0, iterative cal-
culations provide us with an expression of the form
γ(h) = Aγ(h 1) = A
h
γ(0), h 1, (19.6)
where γ(0) is given by (19.5).
Applying the well-known Cayley–Hamilton theorem to (19.6), it is straightforward
to show that the marginal processes will have an ARMA (r,r 1) correlation structure.
Since A is an r ×r matrix, the Cayley–Hamilton theorem ensures that there exist constants
ξ
1
,…, ξ
r
, such that A
r
ξ
1
A
r1
···ξ
r
I = 0.Thus, γ(h) satises
γ(h) ξ
1
γ(h 1) ···ξ
r
γ(h r) = 0, h r. (19.7)
Equations (19.6) and (19.7) hold for every element in γ(h), and hence, the autocorrelation
function of {Y
jt
}, j = 1, ..., r satises
r
ρ
jj
(h) ξ
i
ρ
jj
(h i) = 0, h r.
i=1
415 Models for Multivariate Count Time Series
Thus, each component has an ARMA(r,r1) correlation structure (see also McKenzie, 1988;
Dewald et al., 1989). In the simplest case of a BINAR(1) model, the marginal processes have
ARMA(2,1) correlations with ξ
1
= α
11
+ α
22
and ξ
2
= α
12
α
21
α
11
α
22
. For the diagonal
MINAR(p) case, the marginal process is the simple univariate INAR(p) process.
Al-Osh and Alzaid (1987) expressed the marginal distribution of the INAR(1) model in
d
terms of the innovation sequence {R
t
},thatis, Y
t
=
0
α
i
R
ti
. This result was easily
i=
extended to the case of a diagonal MINAR(1) process (Pedeli and Karlis, 2013c) where
= α
i=0
Y
jt
d
j
i
R
j,ti
.
For the general MINAR(1) process, the distribution of such a process can also be expressed
in terms of the multivariate innovation sequence R
t
as
d
Y
t
= A
i
R
ti
,
i=0
where A
i
= PD
i
P
1
. Here, P is the matrix of the eigenvectors of A and D is the diagonal
matrix of the eigenvalues of A. Since all the eigenvalues should be smaller than 1 in order
for stationarity to hold, the matrix D
i
tends to a zero matrix as i →∞and hence A
i
tends
to zero as well.
The usefulness of such expressions is that they facilitate the derivation of the (joint)
probability generating function (pgf) of the (multivariate) process, thus revealing its
distribution. Assuming stationarity, the joint pgf G
Y
(s) satises the difference equation
G
Y
(s) = G
Y
(A
T
s)G
R
(s).
More details can be found in Pedeli and Karlis (2013c).
Extensions of the model mentioned earlier are possible. One can add covariates to the
mean of the innovations using a log link function. This allows us to t the effect of some
other covariates to the observed multivariate time series, see Pedeli and Karlis (2013b) for
such an application. Also, extensions to higher order are straightforward but lead to rather
complicated models.
19.3.3 Estimation
The least squares approach for estimation was discussed in Latour (1997). However, based
on parametric assumptions for the innovations, other estimation methods are available.
Parametric models also offer more exibility for predictions.
For the estimation of the BINAR(1) model, the method of conditional maximum likeli-
hood can be used. The conditional density of the BINAR(1) model can be constructed as
the convolution of
416 Handbook of Discrete-Valued Time Series
k

f
1
(k) =
y
1,
j
t
1
1
k
y
2,
t
j
1
1
α
11
j
1
(1 α
11
)
y
1,t1
j
1
α
12
kj
1
(1 α
12
)
y
2,t1
k+j
1
,
j
1
=0
s

y
2,t1
y
1,t1
j
2
sj
2
f
2
(s) =
j
2
s j
2
α
22
(1 α
22
)
y
2,t1
j
2
α
21
(1 α
21
)
y
1,t1
s+j
2
,
j
2
=0
and a bivariate distribution of the form f
3
(r
1
, r
2
) = P(R
1t
= r
1
, R
2t
= r
2
). The functions
f
1
(·) and f
2
(·) are the pmfs of a convolution of two binomial variates. Thus, the conditional
density takes the form
g
1
g
2
f (y
t
|y
t1
, θ) = f
1
(k)f
2
(s)f
3
(y
1t
k, y
2t
s),
k=0
s=0
where g
1
= min(y
1t
, y
1,t1
) and g
2
= min(y
2t
, y
2,t1
). Maximum likelihood estimates of
the vector of unknown parameters θ can be obtained by maximization of the conditional
likelihood function
T
L(θ|y) =
f (y
t
|y
t1
, θ) (19.8)
t=1
for some initial value y
0
. The asymptotic normality of the conditional maximum likelihood
estimate θ
ˆ
has been shown in Franke and Rao (1995) after imposing a set of regular-
ity conditions and applying the results of Billingsley (1961) for the estimation of Markov
processes.
Numerical maximization of (19.8) is straightforward with standard statistical packages.
The binomial convolution implies nite summation and hence it is feasible. Note also that
since the pgf of a binomial distribution is a polynomial, one can derive the pmf of the convo-
lution easily via polynomial multiplication using packages in R. Depending on the choice
for the innovation distribution, the conditional maximum likelihood (CML) approach can
be applied. In Pedeli and Karlis (2013c), a bivariate Poisson and a bivariate negative bino-
mial distribution were used. For the parametric models prediction was discussed. An
interesting result is that for the bivariate Poisson innovations the univariate series have
a Hermite marginal distribution. In Karlis and Pedeli (2013), a copula-based bivariate
innovation distribution was used allowing negative cross-correlation.
When moving to the multivariate case things become more demanding. First of all, a
multivariate discrete distribution is needed for the innovations. As discussed in Section
19.2, such models can be complicated. In Pedeli and Karlis (2013a), a multivariate Pois-
son distribution is assumed with a diagonal matrix A. Even in this case, the pmf of
the multivariate Poisson distribution is demanding since multiple summation is needed.
The conditional likelihood can be derived as in the bivariate case but now this is a con-
volution of several binomials and a multivariate discrete distribution. Alternatively, a
composite likelihood approach can be used. Composite likelihood methods are based
on the idea of constructing lower-dimensional score functions that still contain enough
information about the structure considered but they are computationally more tractable
(Varin, 2008). See also Davis and Yau (2011) for asymptotic properties of composite
likelihood methods applied to linear time series models.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset