112 High-Function Business Intelligence in e-business
CORRELATION examples
Using the same salary bonus example in Figure 3-1, we can quantify the strength
of the relationship with the SQL shown in Example 3-3:
Example 3-3 CORRELATION example 1
SELECT CORRELATION (salary,bonus) AS cor
FROM employee
WHERE workdept = ‘D11’
The result of the query is 0.739.
This quantitatively confirms the reasonably strong linear relationship between
salary and bonus from the employees in department ‘D11’.
Another example of correlation involving the retail industry is shown in
Example 3-4. Assume we have the transactions of purchases from all the
customers of a a retail organization selling a variety of products, and we would
like to identify customers with similar buying habits. For example, when Customer
A bought a particular product, Customer B also tended to buy the same product.
Such information can be put to effective use in targeted marketing.
A view called
transhist is created that contains the customer id, product id, and
the dollar amount purchased over all transactions.
Example 3-4 CORRELATION example 2
SELECT a.custid as custid1, b.custid as custid2,
CORR(a.amount, b.amount) AS cor
FROM transhist a, transhist b
WHERE a.prodid = b.prodid AND a.custid < b.custid
GROUP BY a.custid, b.custid
HAVING CORR(a.amount, b.amount) >= 0.5 AND COUNT(*) > 100
ORDER BY a.custid, cor DESC
This query joins the view with itself, and uses the HAVING clause to restrict the
output to cases of high correlation (>= 0.5), and to cases where there are at least
a 100 products involved, that is, there are at least 100 data points used to
compute the correlation.
The result of this query is as follows:
CUSTID1 CUSTID2 CORR
----------- ----------- -----
1026 8271 0.51
1071 2014 0.74 <=
1071 7219 0.63
2014 7219 0.58
8271 9604 0.56