Identify target groups for a campaign

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 4. Statistics, analytic, OLAP functions in business scenarios 181

4.3.3 Identify target groups for a campaign

The objective is to identify a particular group of customers from a larger set that

will most likely respond to a marketing campaign, thus resulting in a better return

on investment.

In our example, a financial institution would like to increase revenue by marketing

mortgages to customers during the first quarter of the new fiscal year. Previous

wider coverage state-wide and county-wide campaigns have been unsuccessful,

and the company would like to focus on a particular city for better results.

Candidate target cities chosen are Palo Alto (8 branches) and San Jose (9

branches) since they are located in the Silicon Valley area which has highest

average family income in the state of California. The rationale being that people

with high incomes generally own their homes or would like to own homes, and

therefore are ideal targets for the campaign.

The decision to choose Palo Alto or San Jose as the target city was based on the

following analysis.

1. A survey was conducted of the residents of Palo Alto and San Jose and the

results were analyzed.

2. Using the Chi-squared technique, infer from the results if a relationship exists

between where a customer lives and the product (s)he will buy.

3. If such a relationship exists, use the Wilcoxon Rank Sum Test to prove that

Palo Alto residents will likely buy mortgage loans.

Data

The main sources of data for this query are:

? Cumulative Distribution of the Chi-square table

? Survey results in the form of Survey tables

? Cumulative Distribution of the Wilcoxon Rank-Sum Statistic table

BI functions showcased

SUM, CUBE, LOG(n), GROUPING, RANK

Steps

We executed the following three steps to arrive at the answer.

Step 1

This involved collecting the survey data and loading it into a DB2 table called

SURVEY. An SQL query (not shown here) was issued against the SURVEY

table it to arrive at a Contingency table as shown in Table 4-1.

182 High-Function Business Intelligence in e-business

Table 4-1 Survey data contingency table

This data was then loaded into a DB2 table SURVEY.

Step 2

We used the Chi-squared technique to prove that there is a statistically

significant relationship between where customer lives and the product he/she will

buy by disproving the null hypothesis. In other words, disproving that where

customers live has no bearing on what products they buy.

The formula for the “maximum likelihood” Chi-squared test for independence is

shown in Table 4-2.

Table 4-2 Chi-squared test for independence test statistic

The SQL query for calculation of the foregoing Chi-square (X) is shown in

Example 4-20.

Product Palo Alto San Jose Total

checking/savings 45 85 130

visa 30 10 40

mortgage 110 80 190

Total 185 175 360

Chi-squared Test for Independence Test Statistic:

Maximum Likelihood X = 2n log (n)

+ [2n11 log (n11) + ... + 2n rc

- [2

n1+ log (n1+) + ... + 2nr+ log (nr+)]

- [2

n+1 log (n+1) + ... + 2n+c log (n+c)]

Where:

nij: # in cell (i,j)

ni+: row i sum

n+j: column j sum

n: total # of user

Note: Table lookups and complex calculations can be automated in DB2 via

User Defined Functions (UDFs).

Chapter 4. Statistics, analytic, OLAP functions in business scenarios 183

Example 4-20 Chi-square computation

WITH c_table (prod_name, city, n, g1, g2) AS

(

SELECT prod_name, city, count(*), 2e0*(0.5e0-GROUPING(prod_name)),

2e0*(0.5e0-GROUPING(city))

FROM survey

GROUP BY CUBE (prod_name, city)

)

SELECT SUM(g1*g2*2e0*n*log(n)) as chi2

FROM c_table

The result of this query is shown in Figure 4-28.

Figure 4-28 Chi-Squared value of city and product preference relationship

The Cumulative Distribution of the Chi-square table (not shown here) shows that

under the null hypothesis, there is no statistical relationship between where the

customer lives and the products (s)he will buy. The probability of seeing a

Chi-square values of 34.1139 or higher is less than 0.001%.

Therefore, we conclude that there is a relationship between where a person lives

and the products (s)he will buy.

Step 3

Here we use the Wilcoxon Rank Sum Test to prove that Palo Alto residents will

likely buy more mortgages than the residents of San Jose.

Here too, it involves

disproving the null hypothesis, that is we disprove that

customers in Palo Alto will

not buy more mortgages than the residents of San

Jose. This involves the following steps:

a. Collect the percentage of people who responded “yes” to likely buying

mortgages for each branch in Palo Alto and San Jose. Load the statistics

to a table called SURVEY_MORTG. The columns in this table are:

Cityid - city

Branchid - branch within the city

PercentYes - Percentage of branch customers who responded YES

Important: Consider using floating point data type instead of INTEGER or

DECIMAL to avoid arithmetic exceptions such as overflow, when multiple

divide and multiplication operators involved.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Identify target groups for a campaign

Create new playlist

Sign In

Sign Up

Table of Contents for
Identify target groups for a campaign