430 Handbook of Discrete-Valued Time Series
RFM is then a weighted average of these variables, and a common rule of thumb for the
weights is 60%, 30%, and 10% for R, F,and M respectively. The three component vari-
ables (R, F,and M) are calculated at time t for each physician i as moving averages over
3 months prior to time t, wih respective weights 0.6, 0.3, and 0.1. To verify the validity of
these weights in our study, we ran a logistic regression on the calibration data, where the
response variable is a binary variable taking the value 1 if physician i wrote a new prescrip-
tion at time t and taking value 0 otherwise, and R, F,and M are predictors in this model.
Suppose the estimated regression coefcients are denoted by
θ
R
,
θ
F
,and
θ
M
, the weight
for recency (R) was computed as
θ
R
/(
θ
R
+
θ
F
+
θ
M
).Weightsfor F and M may be obtained
similarly.
Let β
i
λ
,t
=
β
0,
λ
i,t
, β
λ
2,i,t
and β
=
β
0,
i,t
, β
2,i,t
,sothat β
i,t
=
β
i
λ
,t
, β
,
is a
1,i,t
, β
λ
i,t
1,i,t
, β
i,t
p = 6-dimensional vector. We assume the hierarchical (or structural) equation
β
i,t
= γ
t
+ AO
i,t
1
+ AC
i,t
2
+ CD
i,t
3
+ Z
i
+ v
i,t
, (20.3)
where AO
i,t
= diag(ao
i,t
, ..., ao
i,t
) denote attitudes towards the own drug, AC
i,t
=
diag(ac
i,t
, ..., ac
i,t
) denote attitudes towards the competitive drug, CD
i,t
denote the esti-
mates made by physician i of the competitive detailing at time t, Z
i
represents physician
demographics, v
i,t
N
p
(0, V
i
) denote the errors, and γ
t
is the p-dimensional state vector
whose dynamic evolution is described by the state (or system) equation
γ
t
= Gγ
t1
+ w
t
, (20.4)
where G is an identity matrix since a random walk evolution is assumed, and w
t
N
p
(0, W) are the state errors. The model structure assumes that customer attitudes form
in all time periods, but are observed only when customers respond to the survey. If cus-
tomer attitudes are observed at time t, they affect the dynamic response coefcients in the
hierarchical equation, that is, β
i,t
and thus affect γ
l
for = t, t + 1, ...as well. Note that in
(20.3), the predictor R
i,t
may be replaced by ln(Y
i,t1
+ 1).
Venkatesan et al. (2014) also included a model for handling the endogeneity of sales
calls by modeling D
i,t
as a Poisson distribution conditional on its mean η
i,t
, and mod-
eling ln(η
i,t
) = ζ
0
+
k
p
=1
ζ
k
β
i,t,k
.The ζ coefcients enable us to infer whether the rm
considers customer sales potential and responsiveness to sales calls in its detailing plans.
In general, endogeneity between sales and sales calls may be handled in two ways.
One approach consists of including lagged detailing as well as D
i,t
in (20.2). We, how-
ever, use another approach that accommodates the endogeneity by explicitly modeling
the process that generates detailing D
it
, so that including only D
i,t
in (20.2), and not its
lagged values, is sufcient. Note the similarity to the incidental parameter issue raised by
Lancaster (2000).
Fairly standard, conditionally conjugate prior distributions, as usually adopted in
HDLMs (Landim and Gamerman, 2000), are assumed: π(V
i
) is an inverse-Wishart,
IW(n
v
, S
v
) and π(W) is IW(n
w
, S
w
),with n
v
= n
w
= 2p + 1, and S
v
= S
w
= (2p + 1)I
2p+1
;
π(
1
), π(
2
),and π(
3
) are each MVN(0, 100I
p
); π() is MVN(0, 100I
Kp
) where K denotes
the number of customer demographic predictors; π(ζ) is MVN(ζ
0
, V
ζ
);and π(γ
0
) is
431 Dynamic Models for Time Series of Counts with a Marketing Application
MVN(0, 100I
p
). For details on the choice of hyperparameters, see Venkatesan et al. (2014).
AGibbs sampling algorithm is employed to estimate the posterior distribution of the model
parameters. The coefcients
1
,
2
,
3
,and are obtained through suitable multivariate
normal draws, the variances are routine draws from inverse Wishart distributions, the
Forward-Filtering-Backward-Sampling (FFBS) algorithm enables sampling γ
t
(see Carter
and Kohn 1994; Fruhwirth-Schnatter 1994), and the Metropolis–Hastings algorithm is used
to generate samples from other parameters. Modeling details as well as detailed results and
comparisons with several other models are given in Venkatesan et al. (2014). In particular,
the deviance information criterion (DIC) was the smallest for the hierarchical dynamic ZIP
model that included attitudes in (20.3), followed by the corresponding model without atti-
tudes. The dynamic models performed better than the corresponding static models. The
hierarchical dynamic ZIP model also showed the best in-sample and hold-out predictive
performance, giving the smallest mean absolute deviation (MAD) both for 1-month-ahead
and 12-month-ahead predictions. Physician attitudes, when available, affected β
i,t
and γ
t
.
Information provided by posterior and predictive distributions from convergent MCMC
samples for the model parameters of the hierarchical dynamic ZIP model enables the rm
to make decisions about customer selection and resource allocation by analyzing the cus-
tomer lifetime value (CLV) metric. CLV was computed over 35 months, because the rm
revealed that it did not plan its sales force allocations over 3 years ahead, and is
T
+36
(1
i,t
)
Y
i,t
c
i,t
D
i,t
CLV
i
=
(1 + d
)
tT
, (20.5)
i=T
+1
where T
= 10, d
is the discount coefcient, c
i,t
is the unit cost of a sales call, and
Y
i,t
and
D
i,t
denote the predicted means of the sales and detailing, respectively.
Ongoing collection of physician attitudes via surveys requires an annual investment of
over $1 million from the rm, which would wish to evaluate whether the nancial returns
from collecting and using these attitudes in modeling exceeds the investment. Venkate-
san et al. (2014) used customer selection and customer-level resource allocation based on
a hold-out sample of 1000 physicians. The objective of the customer selection process is to
identify the physicians who would be protable in the future so that they can be prioritized
for targeting. Physician-level sales and retention were predicted from months 10 to 45,
and these predictions were used to compute the physician’s CLV using (20.5). Missing atti-
tudes in the hold-out sample were imputed using an ordered probit model (Albert and
Chib, 1993).
Predictive results from a hierarchical dynamic ZIP model that includes physican atti-
tude information in (20.3) were compared to results from a model that does not include
data on attitudes, in order to quantify the implications to the rm and discuss selection of
protable physicians. Physicians can be classied into quintiles based on the actual CLV,
the CLV predicted from the hierarchical dynamic ZIP model that includes customer atti-
tudes, and the CLV predicted from a hierarchical dynamic ZIP model that did not include
customer attitudes. The incremental prot from including customer attitudes was equiva-
lent to 0.93% of the total CLV obtained from physicians identied to be in the top quintile
based on their observed prots. This implies that if the rm was targeting the top quin-
tile of its customer base, the returns from including customer attitudes to select the most
432 Handbook of Discrete-Valued Time Series
likely physicans to target will be 0.93% higher than not including customer attitudes.
Similarly, the returns from including customer attitudes would be higher by 3.57%,
29.62%, 79.33%, and 24.12% relative to not including customer attitudes, if the rm tar-
gets the second, third, fourth, and the fth quintiles, respectively. The incremental prots
from including attitudes were highest for the mid-tier groups, that is, third and fourth
quintiles.
20.4 Hierarchical Multivariate Dynamic Models for Prescription Counts
Let Y
it
= (Y
1,it
, ..., Y
m,it
),for t = 1, ..., T, denote the m-dimensional time series of new
prescription counts from physician i, where i = 1, ..., N. The components of the vec-
tor correspond to counts of the rm’s own drug and the competing drugs. We propose
a nite mixture of multivariate Poisson distributions as a sampling distribution of the
m-dimensional vector, which allows negative as well as positive associations between
counts of the own drug and the competing drugs. We start with a review of mixtures of
multivariate Poisson distributions in Section 20.4.1 and then show a general hierarchical
dynamic modeling framework in Section 20.4.2.
20.4.1 Finite Mixtures of Multivariate Poisson Distributions
Following Mahamunulu (1967) and Johnson et al. (1997), the denition of an m-variate
Poisson distribution for a random vector of counts Y is based on a mapping g : N
q
N
m
,
q m, such that Y = g(X) = AX. Here, X = (X
1
, ..., X
q
)
is a vector of unobserved
independent Poisson random variables, that is, X
r
Poisson(λ
r
) for r = 1, ..., q;and A
is an arbitrary m × q matrix which determines the properties of the multivariate Poisson
distribution. The m-dimensional vector Y = (Y
1
, ..., Y
m
)
= AX follows a multivariate
Poisson distribution with parameters λ = (λ
1
, ..., λ
q
)
and pmf MP
m
(y|λ) given by
q
P(Y = y|λ) =
P(X = x) = P(X
r
= x
r
|λ
r
), (20.6)
xg
1
(y) xg
1
(y)
r=1
where g
1
(Y) denotes the inverse image of Y N
m
and for r = 1, ..., q, the pmf of the
univariate Poisson distribution is P(X
r
= x
r
|λ
r
) = exp(λ
r
)λ
x
r
r
/x
r
!. The mean vector and
variance–covariance matrix of Y conditional on λ are given by
E(Y|λ) = Aλ;Cov(Y|λ) = AA
, (20.7)
where = diag(λ
1
, ..., λ
q
). When m = 1, MP
m
(y|λ) in (20.6) reduces to the univariate
Poisson pmf P(Y = y|λ) = exp(λ)λ
y
/y!. Use of the multivariate Poisson distribution for
modeling applications has been sparse, possibly due to the complicated form of the pmf
(20.6) which does not lend itself to easy computation.
Karlis and Meligkotsidou (2005) proposed a two-way covariance structured multivari-
ate Poisson distribution, which permits more realistic modeling of multivariate counts
in practical applications. This distribution is constructed by setting A =[A
1
A
2
], where
433 Dynamic Models for Time Series of Counts with a Marketing Application
A
1
= I
m
captures the main effects; A
2
captures the two-way covariance effects; A
2
is an
m ×[m(m 1)]/2 binary matrix; each column of A
2
has exactly two ones and (m 2) zeros
and no duplicate columns exist; and q = m +[m(m 1)]/2. Correspondingly, split the
parameter λ into two parts, that is, λ
(1)
= (λ
1
, ..., λ
m
)
, which corresponds to the m main
effects, and λ
(2)
= (λ
m+1
, ..., λ
q
)
which corresponds to the m(m 1)/2 pairwise covariance
effects. When m = 2, q = 3, let Y = (Y
1
, Y
2
)
,andlet Y
1
= X
1
+X
3
and Y
2
= X
2
+X
3
, where
X
i
Poisson(λ
i
), i = 1, 2, 3. The two-way covariance structured bivariate Poisson pmf is
y
1
y
2
s

i
MP
2
(y|λ) = exp{−(λ
1
+ λ
2
+ λ
3
)}
λ
y
1
1
!
λ
y
2
2
!
y
i
1
y
i
2
i!
λ
λ
1
λ
3
2
, (20.8)
i=0
where s = min(y
1
, y
2
). When m = 3, q = 6, let Y = (Y
1
, Y
2
, Y
3
)
,andlet Y
1
= X
1
+ X
4
+ X
5
,
Y
2
= X
2
+ X
4
+ X
6
,and Y
3
= X
3
+ X
5
+ X
6
, where X
i
Poisson(λ
i
) for i = 1, ...,6. The
two-way covariance structured trivariate Poisson pmf is
6
y
1
X
4
X
5
y
2
X
4
X
6
λ λ
MP
3
(y|λ) = exp λ
i
(y
1
X
1
4
X
5
)!(y
2
2
X
4
X
6
)!
i=1 (X
4
, X
5
, X
6
)C
y
3
X
5
X
6
X
4
X
5
X
6
×
λ
3
λ
4
λ
5
λ
6
, (20.9)
(y
3
X
5
X
6
)!X
4
!X
5
!X
6
!
where the summation is over the set C such that C =[(X
4
, X
5
, X
6
) N
3
: (X
4
+ X
5
y
1
) (X
4
+ X
6
y
2
) (X
5
+ X
6
y
3
)] =∅]. For m = 2and m = 3, the matrix A has the
respective forms
100110
101
and
010101
.
011
001011
Under this structure, the variance–covariance matrix of Y given in (20.7) does not accom-
modate negative associations among the components of Y (Karlis and Meligkotsidou,
2005).
We proposed an approach for calculating the multivariate Poisson pmf which is faster
than the recursive scheme proposed by Tsiamyrtzis and Karlis (2004). When m = 2,
let y
1
and y
2
denote the observed counts, and without loss of generality, assume that
y
1
y
2
,so that min(y
1
, y
2
) = y
1
. Since X
3
is the common term in the denitions of Y
1
and Y
2
, it is straightforward to obtain the set of possible values that X
3
can assume, that is,
x
3
= 0, ...,min(y
1
, y
2
), and obtain the corresponding values assumed by X
1
and X
2
to be,
respectively, X
1
= y
1
x
3
and X
2
= y
2
x
3
. We have solved for all possible sets of values
for the inverse image of y,thatis, x g
1
(y). The pmf for the bivariate Poisson distribu-
tion can be calculated using (20.8). When m = 3, without loss of generality, we assume that
y
1
y
2
y
3
. The possible values for x
4
and x
5
are in the set C
1
= (0, ..., y
1
),andthepos-
sible values for x
6
areintheset C
2
= (0, ..., y
2
). We have in total L different combinations
for (x
4
, x
5
, x
6
), where L = (length of set C
1
)
2
× (length of set C
2
) = (y
1
+ 1)
2
(y
2
+ 1).The
corresponding values for X
1
, X
2
, X
3
can be calculated from (20.9). Let C
denote the set of
L different combinations of possible values for all q = 6 independent Poisson variables.
434 Handbook of Discrete-Valued Time Series
Since it is possible that in the set C
, X
1
, X
2
,or X
3
may assume negative values, a subset of
C
which only contains nonnegative values of X
1
, X
2
,andX
3
is the inverse image of y.The
pmf of the trivariate Poisson distribution is then obtained using (20.9). Computing times
for evaluating the multivariate Poisson pmfs is discussed in Hu (2012).
Karlis and Meligkotsidou (2007) proposed nite mixtures of multivariate Poisson dis-
tributions, which allow for overdispersion in both the marginal distributions and negative
correlations, and thus offer a wide range of models for real data applications. The pmf of a
nite mixture of H multivariate Poisson distributions with mixing proportions π
1
, ..., π
H
is given by
H
p(y|) =
π
h
MP
m
(y|λ
h
),
h=1
where denotes the set of parameters (λ
1
, ..., λ
H
, π
1
, ..., π
H1
). The expectation and
covariance of Y conditional on λ are
H H
H

H
E(Y|λ) =
π
h
Aλ
h
;Cov(Y) = A
π
h
(
h
+ λ
h
λ
h
) π
h
λ
h
π
h
λ
h
A
,
h=1 h=1 h=1 h=1
where
h
= diag(λ
1,h
, ..., λ
q,h
).
20.4.2 HMDM Model Description
A general framework for an HMDM allowing only for positive associations between com-
ponents of Y
i,t
is discussed in Ravishanker et al. (2014), by assuming a multivariate Poisson
sampling distribution. Here, we extend this general formulation to a mixture of multivari-
ate Poisson sampling distribution. The observation equation and a model for the latent
process λ
j,i,t,h
of the extended HMDM are given in the following:
H
p(y
i,t
|λ
i,t,h
) = π
h
MP
m
(y
i,t
|λ
i,t,h
),
h=1
ln λ
j,i,t,h
= B
j
,i,t
δ
j,i,t,h
+ S
j
,i,t
η
j,h
, j = 1, ..., q, (20.10)
where B
j,i,t
= (B
j,i,t,1
, ..., B
j,i,t,a
j
)
is an a
j
-dimensional vector of exogenous predictors
with location-time-varying (dynamic) coefcients δ
j,i,t,h
= (δ
j,i,t,h,1
, ..., δ
j,i,t,h,a
j
) and S
j,i,t
=
(S
j,i,t,1
, ..., S
j,i,t,b
j
)
is a b
j
-dimensional vector of exogenous predictors with static coefcients
η
j,h
= (η
j,h,1
, ..., η
j,h,b
j
)
. We assume that the model either includes δ
j,i,t,h,1
which represents
the location-time-varying intercept, or includes η
j,h,1
which represents the static intercept,
that is, either D
j,i,t,1
= 1or S
j,i,t,1
= 1. A simple formulation of (20.10) could set a
j
= 1for
j = 1, ..., q,set b
j
= b > 1for j = 1, ..., m,and b
j
= 0for j = m + 1, ..., q, which implies
using only the location-specic and time-dependent intercept to model the Poisson means
corresponding to the association portion, and the location–time intercept together with
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset