255 Bayesian Modeling of Time Series of Counts with Business Applications
The conditionals of Y
jt
will also be negative binomial-type distributions. The dynamic
conditional mean (or regression) of Y
jt
given Y
jt
can be obtained as
E[Y
jt
|Y
it
, λ
i
, λ
j
, D
t1
] =
λ
j
γα
t1
+ Y
it
, (11.40)
λ
i
+ γβ
t1
which is linear in Y
it
. It can be easily seen that the bivariate counts are positively correlated
and the correlation is given by
λ
i
λ
j
Cor =

. (11.41)
Y
it
, Y
jt
|λ
i
, λ
j
, D
t1
λ
i
+ γβ
t1
λ
j
+ γβ
t1
Other properties of the dynamic multivariate distribution are given in Aktekin et al. (2014).
The estimation of this model using MCMC would be straightforward using the FFBS
algorithm for θ
t
s in conjunction with a Gibbs sampler step for the λ
j
s whose full condition-
als are given by
p
λ
j
|θ
1
, ..., θ
t
, D
t
Gamma
a
jt
, b
jt
, (11.42)
where a
jt
= a
j
+ Y
j1
+ ··· + Y
jt
and b
jt
= b
j
+ θ
1
+ ··· + θ
t
. By iteratively sampling
from the conditional distributions of (θ
1
, ..., θ
t
|λ
1
, ..., λ
J
, D
t
) using the FFBS algorithm and
(λ
j
|θ
1
, ..., θ
t
, D
t
) for all j, one can obtain samples from (θ
1
, ..., θ
t
, λ
1
, ..., λ
J
|D
t
).
11.5 Business Applications
In order to show how the models are applied to count time series in business applications,
we have used three data sets. Example 11.1 consists of time series counts of the num-
ber of calls arriving to a call center in a given time interval. Example 11.2 consists of the
number of people who defaulted in a given mortgage pool. Example 11.3 consists of the
number of weekly grocery store visits for households. We discuss the implementation and
the estimation of the proposed Poisson time series models using these three examples.
11.5.1 Example 11.1: Call Center Arrival Count Time Series Data
To show the use of the basic model without any covariates, we consider the time series of
counts of call center arrivals during different intervals of 164 days from an anonymous U.S.
commercial bank as discussed in Aktekin and Soyer (2011). Each day consists of 169 time
intervals each of which has a duration of 5 min. On a given day, the call center is operational
between 7:00 AM and 9:05 PM.
We have only used the rst week of the data for illustration purposes and have provided
within-day updating and forecasting results separately for Monday–Friday of the week.
Such an approach would be of interest to call center managers who would like to be able
to determine staff schedules in advance for different time intervals on a given day.
256 Handbook of Discrete-Valued Time Series
E (theta|D)
300
250
200
150
100
0 50 100 150
Monday
Tuesday
Wednesday
Thursday
Friday
t
FIGURE 11.1
Posterior arrival rates for different days of the week.
Given the ltering distribution, we obtained the means of the latent arrival rates for
a particular time interval for each day given information on the entire data consisting of
164 days each with 169 time intervals. These are shown in Figure 11.1 from which a certain
type of ordering between the days of the week can be inferred. As such, we set the initial
prior parameters from (11.5) for the arrival rate as α
0
i
= α
0
= 0.001 and β
0
i
= β
0
= 0.001
for all i,with i representing a specic day of the week.
Furthermore, summary statistics for the posterior discounting factor, γ, for each day of
the week are shown in Table 11.1. Discounting occurs on the sum of the previously observed
values of the call arrivals for a given period and is therefore a function of the data dimen-
sion used. Each day seems to exhibit a slightly different discount behavior. The fact that the
posterior means of the discounting terms are getting smaller as we observe more data indi-
cates that the model emphasizes arrival counts observed during the within-day interval of
interest (say t) more than the previously observed arrival counts (say t 1, ...,1).
TABLE 11.1
Posterior means and standard deviations of γ for
different days
Day Mean St. Dev
Mondays 0.066 0.0025
Tuesdays 0.046 0.0016
Wednesdays 0.092 0.0042
Thursdays 0.084 0.0039
Fridays 0.075 0.0032
257 Bayesian Modeling of Time Series of Counts with Business Applications
11.5.2 Example 11.2: Mortgage Default Count Time Series Data
In illustrating the use of the basic model with covariates, we use data provided by the
Federal Housing Administration (FHA) of the U.S. Department of Housing and Urban
Development. These data have been analyzed in detail by Aktekin et al. (2013). In our anal-
ysis, we use 144 monthly defaulted FHA insured single-family 30-year xed rate mortgage
loans from 1994 in the Atlanta region. In addition, we make use of covariates such as the
regional conventional mortgage home price index (CMHPI), federal cost of funds index
(COFI), the homeowner mortgage nancial obligations ratio (FOR), and regional unem-
ployment rate (Unemp). A time series plot of the monthly mortgage count data is shown in
Figure 11.2, where a nonstationary behavior that can be captured by our Poisson state-space
models is observed.
In analyzing the default count data, the discounting factor γ introduced in (11.3) is
assumed to follow a discrete uniform distribution dened over (0, 1) in order to keep the
updating/ltering tractable. The posterior distribution of γ is obtained via (11.24) and is
shown in the left panel of Figure 11.3. Thus, given the posterior of γ and the FFBS algorithm,
it is possible to obtain the retrospective t of counts. An overlay plot of the mean default
rates and the actual data is shown in the right panel of Figure 11.3. The availability of the
joint distribution of the default rate over time, that is, p(θ
1
, ..., θ
t
|D
t
), would be of interest
to institutions that are managing the loans for the purposes of risk management. Further-
more, the Bayesian approach allows direct comparisons of probabilities involving Poisson
rates (in this case default rates) during different time periods. For instance, it would be
straightforward to compute the probability that the default rate during the second month
is greater than that of the rst month for a given cohort, that is, p(θ
2
θ
1
|D
144
).
In order to take into account the effects of covariates (macroeconomic variables in this
case) on the default rate, we have used the model with covariates. In doing so, we assume
the prior of γ to be continuous uniform over (0, 1) and those of the covariate coefcients, ψ,
to be independent normal distributions. The MCMC algorithm was run for 10,000 iterations
Default counts
250
200
150
100
50
0
0 20 40 60 80 100 120 140
t
FIGURE 11.2
Time series plot of monthly default counts.
258 Handbook of Discrete-Valued Time Series
0.00
0.01
0.02
0.03
0.04
Posterior
Default counts
250
200
150
Actual
Basis model
100
50
0
0.10 0.15 0.20 0.25 0.30 0.35 0.40 0 20 40 60 80 100 120 140
(a)
γ
(b)
t
FIGURE 11.3
Posterior γ (a) and the retrospective t to count data (b).
with a burn-in period of 2,000 iterations with no convergence issues. The posterior density
plots of ψ and γ are shown in Figure 11.4 where γ exhibits similar behavior to the posterior
discounting term obtained for the basic model as in the left panel of Figure 11.3.
Table 11.2 shows the posterior summary statistics for the covariates. All macroeconomic
variables seem to have fairly signicant effects on the default rate. CMHPI, COFI, and
Unemp have positive effects on default counts. For instance, as unemployment tends to
go up, the model suggests that the number of people defaulting tends to increase for the
cohort under study. On the other hand, the homeowner FOR seems to decrease the expected
number of defaults as it goes up, namely, as the burden of repayment becomes relatively
easier, homeowners are less likely to default.
Figure 11.5 shows that the model with covariates provides a reasonably good t to the
data. Furthermore, the behavior of the latent default rates, θ
t
s, can be described via their
joint distribution p(θ
1
, ..., θ
144
|D
144
). A boxplot of θ
t
s is shown in Figure 11.6, which pro-
vides insights into the stochastic and temporal behavior of the latent rates given the count
data and the relevant covariates.
11.5.3 Example 11.3: Household Spending Count Time Series Data
Our nal example utilizes the multivariate extension of the basic model in the context of
household spending. In order to illustrate the workings of the multivariate model in a sim-
ple setup, we consider bivariate count data. However, we emphasize that the multivariate
count model can be applied to higher orders relatively easily. The data consist of the weekly
grocery store visits of 540 Chicago-based households accumulated over 104 weeks, from
which we have considered two households as in Figure 11.7. In other words, we have two
different Poisson time series for the different households and assume that their visits to
the grocery store can be modeled by (11.31). Our assumption is that each household’s visit
to the grocery store is affected by the same environment, that is, the economic situation,
weather, and so on. That is, we assume that the grocery store arrival process of a household
in Chicago will exhibit behavior similar to that of any other household.
259
1.2
1.5
25
1.0
20
0.8
1.0
15
0.6
10
0.5
0.4
5
0.2
0.0
0
0.0
−0.02 0.00 0.02 0.04 0.06 0.0 0.5 1.0 1.5 2.0 −2.5 −2.0 −1.5 −1.0 −0.5 0.0
ψ
CMHPI
ψ
COFI
ψ
FOR
0.0 0.5 1.0 1.5 2.0
0.15 0.20 0.25 0.30 0.35
ψ
Unemp
γ
FIGURE 11.4
Posterior density plots of ψ and γ for the model with covariates.
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
0
5
10
15
Bayesian Modeling of Time Series of Counts with Business Applications
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset