4.4 Order Quantity and Order Size

4.4.1 How Much (in $) Will a Customer Order?

Loyalty programs are designed to increase the level of customer loyalty and encourage customers to make more purchases or use more services. Bolton et al. [22] investigated whether loyalty programs increase customers' satisfaction and their usage levels of products/services. Since the data used had customers' transactions, these authors directly modeled the number of transactions during a certain period using a Tobit model. Bolton and Lemon [13] modeled customers' usage level in a dynamic model. These authors argued that there is some minimum unobserved level of usage (threshold) associated with use of the service at all, say c. The amount of usage actually observed can be written as

(4.30) equation

The independent variables in the future usage equation are overall satisfaction, price, and a vector of cross-sectional economic variables. This specification is easily captured by a Tobit model. Following Borle et al. [23], the authors assumed that the amount expended by customer img on purchase occasion img, img, followed a log-normal process

(4.31) equation

where the img parameter is

(4.32) equation

where img. The coefficient img specifies the impact of gender on purchase amounts and the coefficient img captures a nonlinear trend in the purchase amounts across purchase occasions. The coefficient img specifies the impact of lagged dollars spent on future amounts expended. The authors allowed this parameter to vary across customers as

(4.33) equation

and jointly estimated the purchase amount with the interpurchase time and the customer defection model using a MCMC sampling algorithm.

4.4.2 How Many Items Will a Customer Order?

Anderson and Simester [14] argued that the number of units ordered can possibly be zero, a situation which should not be considered as censoring and truncation. The authors thus considered that the Tobit model is not appropriate for quantity modeling and they proposed that the number of units purchased is a count measure, following a Poisson distribution. They used a Poisson model to specify the number of units ordered by customer img as

(4.34) equation

where img is assumed drawn from a Poisson distribution with parameter img, img, and img include the RFM measures and a dummy variable indicating whether customer img received the promotion. We provide an introduction to the Poisson regression model in Appendix J.

In modeling count event, the data are often overdispersed, meaning conditional variance exceeding conditional mean. Since Poisson regression assumes equidispersion (equality of mean and variance), it is too restrictive so that negative binomial regression is often used as an alternative modeling method. Zhang et al. [24] adopted negative binomial regression to model the number of brand purchases in a given time period. In their study, the number X of items of brand H that household i purchased during a one-year period is assumed to follow a Poisson distribution with a mean purchase rate of img, which is determined by a set of explanatory variables in the negative binomial regression (NBR) model. img is parameterized as

(4.35) equation

where img denotes the explanatory variables, such as brand H's price, share of advertisements in the category, share of displays in the category, customer loyalty, household size, and household income, and img is the error term that is assumed to follow a gamma distribution in the NBR specification. An introduction to the NBR model of Cameron and Trivedi (1998) is provided in Appendix K.

4.4.3 What Is the Average Order Size?

Marketing activities may influence the order size, such as the average unit price, by existing customers. Researchers have investigated such effects by linear regression. Anderson and Simester [14] adopted a multivariate regression (OLS) to analyze the effect of discount depth on the average price of products purchased by existing customers. These authors included RFM measures, promotion dummy variables, and the average price of products by existing customers in previous purchases. Lewis [15] modeled the average order size of existing customers including shipping fee variables, pricing variables, and coupon promotion dummy variables as explanatory variables. To account for the possible endogenous bias in the systems of equations, the author adopted a three-stage least squares estimation in modeling order incidence, order size, and net shipping contribution.

4.4.4 Empirical Example: Order quantity

Many firms have realized that it is not sufficient to merely focus on just trying to get a customer to repurchase. The firm should also focus on how much value that purchase is likely to provide. Research in marketing has shown that the order value can be a valuable predictor in a customer's future value to the firm – or at the least justify the amount of money that is spent on customer retention efforts. Thus, it can be useful to understand the drivers of order quantity and in turn be able to predict each prospect's expected order quantity given an order is likely to occur. At the end of this example we should be able to do the following:

1. Determine the drivers of order quantity (value).
2. Predict the expected order quantity for each customer.
3. Determine the predictive accuracy of the model.

The information we need for this model includes the following list of variables:

Dependent variables
Purchase 1 when the customer purchased in the given quarter, 0 if no purchase occurred in that quarter
Order_Quantity The dollar value of the purchases in the given quarter
Independent variables
Lambda (λ) The computed inverse Mills ratio from the acquisition model
Lag_Purchase 1 if the customer purchased in the previous quarter, 0 if no purchase occurred in the previous quarter
Avg_Order_Quantity The average dollar value of the purchases in all previous quarters
Ret_Expense Dollars spent on marketing efforts to try and retain that customer in the given quarter
Ret_Expense_SQ Square of dollars spent on marketing efforts to try and retain that customer in the given quarter
Gender 1 if the customer is male, 0 if the customer is female
Married 1 if the customer is married, 0 if the customer is not married
Income 1 if income < $30 000
2 if $30 001< income < $45 000
3 if $45 001 < income < $60 000
4 if $60 001 < income < $75 000
5 if $75 001 < income < $90 000
6 if income > $90 001
First_Purchase The value of the first purchase made by the customer in quarter 1
Loyalty 1 if the customer is a member of the loyalty program, 0 if not

We see from the data requirement that in order to determine the drivers of order quantity we need to have two dependent variables: Purchase and Order_Quantity. This is due to the fact that expected order quantity is derived from the following equation:

equation

This equation shows us that the expected order quantity is a function of the probability that the customer will purchase in the given quarter multiplied by the expected value of a purchase given that the customer made the purchase. If we were to merely run a regression with Order_Quantity as the dependent variable and ignore the probability that the customer will make a purchase, we would get biased estimates due to a potential sample selection bias.

Sample selection bias is a problem that is common in many marketing problems and has to be statistically accounted for in many modeling frameworks. In this case the customer has a choice as to whether or not to purchase before deciding how much to purchase. If we were to ignore this choice we would bias the estimates from the model and we would have less precise predictions for the value of Order_ Quantity. To account for this issue we need to be able to predict the value for both the probability of Purchase (similar to what we have done for the first empirical example in this chapter) and the expected value of Order_Quantity given that the customer is expected to make a purchase. One important consideration to note is that we cannot just run two models independently since there is likely to be a correlation between the error terms of the two models. Thus, we need to use a modeling framework that can simultaneously estimate the coefficients of the two models, or at least account for the correlation between Order_Quantity and Purchase. To do this we use a two-stage modeling framework similar to that described earlier in this chapter and found in Reinartz et al. [25].

The first model for Purchase will be set up using the same equation as for the repurchase probability example earlier in the chapter. The only difference here is that instead of using a logistic regression we will be using a probit model to estimate the coefficients. The main reason for this lies in the error term of the probit model which is normally distributed with a mean of 0 and a standard deviation of 1. The fact that the probit model and the OLS regression model (which we will be using for Order_Quantity) are both normally distributed allows us to more easily estimate them in a two-stage framework.

Once we estimate the probit model we need to create a new variable, λ, which will represent the correlation in the error structure across the two equations. This variable, also known as the sample selection correction variable, will then be used as an independent variable in the Order_Quantity model to remove the sample selection bias in the estimates. To compute λ we use the following equation, also known as the inverse Mills ratio:

equation

In this equation ϕ represents the normal PDF, Φ represents the normal cumulative density function, X represents the value of the variables in the Purchase model, and β represents the coefficients derived from the estimation of the Purchase model.

Finally, we want to estimate a regression model for Order_Quantity and include the variable ρ as an additional independent variable. This is done in a straightforward manner in the following equation:

equation

In this case Order_Quantity is the value of the order quantity in the given time period, γ is the matrix of variables used to help explain the value of Order_Quantity, α are the coefficients for the independent variables, μ is the coefficient on the inverse Mills ratio, λ is the inverse Mills ratio, and ε is the error term.

When we estimate the two-stage model, we get the following parameter estimates for each of the two equations:

img

We gain the following insights from the results. We see that λ is positive and significant. We can interpret this to mean that there is a potential selection bias problem since the error term of our selection equation is correlated positively with the error term of our regression equation. We also see that all other variables of the order quantity model are significant with the exception of Married, meaning that we have likely uncovered many of the drivers of order quantity.

We find that Lag_Purchase is positive, suggesting that customers who purchased in the previous quarter are more likely to spend more in the current quarter. We find that Avg_Order_Quantity is also positive, suggesting that the higher the average past order values of the customer, the higher the current order value. We find that Ret_Expense is positive with a diminishing return, as noted by the positive coefficient on Ret_Expense and the negative coefficient on Ret_Expense_SQ. This means that marketing efforts to retain and build relationships with the customer do cause the customer to purchase more, to a point. Then, after the threshold is reached, marketing efforts actually decrease the value of the purchase on average. This is likely due to the fact that overly contacting customers can often strain the relationship between the customer and firm. We find that that four of the customer characteristic variables are positive (Gender, Income, First_Purchase, and Loyalty) suggesting that customers who are male, have a higher income, have a higher first purchase value, and are members of the loyalty program tend to have larger order quantities.

Our next step is to predict the value of Order_Quantity to see how well our model compares to the actual values. We do this by starting with the equation for expected order quantity at the beginning of this example:

equation

In this case Φ is the normal CDF distribution, X is the matrix of independent variable values from the Purchase equation, β is the vector of parameter estimates from the Purchase equation, γ is the matrix of independent variables from the Order_Quantity equation, α is the vector of parameter estimates from the Order_Quantity equation, μ is the parameter estimate for the inverse Mills ratio, and λ is the inverse Mills ratio. Once we have predicted the Order_ Quantity value for each of the customers, we want to compare this to the actual value from the database. We do this by computing the mean absolute deviation (MAD) as follows:

equation

We find for the acquired customers that MAD = 54.77. This means that on average each of our predictions of Order_Quantity deviates from the actual value by $54.77. If we were to instead use the mean value of Order_Quantity ($129.10) across all customers across quarters 2 through 12 (we drop quarter 1 due to the lagged nature of many of the independent variables in both the repurchase model and the order quantity model) as our prediction for all prospects (this would be the benchmark model case), we would find that MAD = 133.71, or $133.71. As we can see, our model does a significantly better job of predicting the value of initial order quantity than the benchmark case.

4.4.5 How Do You Implement it?

In this example we used a two-stage least squares approach with a probit model for acquisition and a least squares regression for the initial order quantity. We used multiple procedures in SAS to implement this model. First we used PROC Logistic with a probit link function to estimate the model of customer purchase behavior. Next we used a SAS Data step to compute the inverse Mills ratio using the output of the probit model. Finally we ran an OLS regression using PROC Reg and added the inverse Mills ratio as an additional variable. While we did use SAS to implement this modeling framework, programs such as SPSS can be used as well.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset