Chapter 4. Statistics, analytic, OLAP functions in business scenarios 181
4.3.3 Identify target groups for a campaign
The objective is to identify a particular group of customers from a larger set that
will most likely respond to a marketing campaign, thus resulting in a better return
on investment.
In our example, a financial institution would like to increase revenue by marketing
mortgages to customers during the first quarter of the new fiscal year. Previous
wider coverage state-wide and county-wide campaigns have been unsuccessful,
and the company would like to focus on a particular city for better results.
Candidate target cities chosen are Palo Alto (8 branches) and San Jose (9
branches) since they are located in the Silicon Valley area which has highest
average family income in the state of California. The rationale being that people
with high incomes generally own their homes or would like to own homes, and
therefore are ideal targets for the campaign.
The decision to choose Palo Alto or San Jose as the target city was based on the
following analysis.
1. A survey was conducted of the residents of Palo Alto and San Jose and the
results were analyzed.
2. Using the Chi-squared technique, infer from the results if a relationship exists
between where a customer lives and the product (s)he will buy.
3. If such a relationship exists, use the Wilcoxon Rank Sum Test to prove that
Palo Alto residents will likely buy mortgage loans.
Data
The main sources of data for this query are:
? Cumulative Distribution of the Chi-square table
? Survey results in the form of Survey tables
? Cumulative Distribution of the Wilcoxon Rank-Sum Statistic table
BI functions showcased
SUM, CUBE, LOG(n), GROUPING, RANK
Steps
We executed the following three steps to arrive at the answer.
Step 1
This involved collecting the survey data and loading it into a DB2 table called
SURVEY. An SQL query (not shown here) was issued against the SURVEY
table it to arrive at a Contingency table as shown in Table 4-1.
182 High-Function Business Intelligence in e-business
Table 4-1 Survey data contingency table
This data was then loaded into a DB2 table SURVEY.
Step 2
We used the Chi-squared technique to prove that there is a statistically
significant relationship between where customer lives and the product he/she will
buy by disproving the null hypothesis. In other words, disproving that where
customers live has no bearing on what products they buy.
The formula for the maximum likelihood Chi-squared test for independence is
shown in Table 4-2.
Table 4-2 Chi-squared test for independence test statistic
The SQL query for calculation of the foregoing Chi-square (X) is shown in
Example 4-20.
Product Palo Alto San Jose Total
checking/savings 45 85 130
visa 30 10 40
mortgage 110 80 190
Total 185 175 360
Chi-squared Test for Independence Test Statistic:
Maximum Likelihood X = 2n log (n)
+ [2n11 log (n11) + ... + 2n rc
- [2
n1+ log (n1+) + ... + 2nr+ log (nr+)]
- [2
n+1 log (n+1) + ... + 2n+c log (n+c)]
Where:
nij: # in cell (i,j)
ni+: row i sum
n+j: column j sum
n: total # of user
Note: Table lookups and complex calculations can be automated in DB2 via
User Defined Functions (UDFs).
Chapter 4. Statistics, analytic, OLAP functions in business scenarios 183
Example 4-20 Chi-square computation
WITH c_table (prod_name, city, n, g1, g2) AS
(
SELECT prod_name, city, count(*), 2e0*(0.5e0-GROUPING(prod_name)),
2e0*(0.5e0-GROUPING(city))
FROM survey
GROUP BY CUBE (prod_name, city)
)
SELECT SUM(g1*g2*2e0*n*log(n)) as chi2
FROM c_table
The result of this query is shown in Figure 4-28.
Figure 4-28 Chi-Squared value of city and product preference relationship
The Cumulative Distribution of the Chi-square table (not shown here) shows that
under the null hypothesis, there is no statistical relationship between where the
customer lives and the products (s)he will buy. The probability of seeing a
Chi-square values of 34.1139 or higher is less than 0.001%.
Therefore, we conclude that there is a relationship between where a person lives
and the products (s)he will buy.
Step 3
Here we use the Wilcoxon Rank Sum Test to prove that Palo Alto residents will
likely buy more mortgages than the residents of San Jose.
Here too, it involves
disproving the null hypothesis, that is we disprove that
customers in Palo Alto will
not buy more mortgages than the residents of San
Jose. This involves the following steps:
a. Collect the percentage of people who responded yes to likely buying
mortgages for each branch in Palo Alto and San Jose. Load the statistics
to a table called SURVEY_MORTG. The columns in this table are:
Cityid - city
Branchid - branch within the city
PercentYes - Percentage of branch customers who responded YES
Important: Consider using floating point data type instead of INTEGER or
DECIMAL to avoid arithmetic exceptions such as overflow, when multiple
divide and multiplication operators involved.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset