Appendix A. Introduction to statistics and analytic concepts 229
A.1.8 Wilcoxon rank sum test
The Wilcoxon rank sum test is used to test the assumption that outcomes for a
variable (or a set of variables) under differing conditions are related. That is, the
outcome under the test and control conditions are purely random and the two
conditions have no effect on the outcome.
The basic concept here is that if there is no difference between attribute values
for two populations, then ranks from one population should not be systematically
higher or lower than those of the other. The distribution of ranks for a population
is known under null hypothesis. If W is so large that the probability of W being
greater than or equal to the computed statistic is small, then reject the null
hypothesis.
A potential use of this test is determining the success or failure of a marketing
campaign in a particular demographic area by comparing the sales results of a
campaign test area with that of another area where no campaign was in effect.
The test methodology involves:
1. Determining an overall ranking to all the outcomes in both test and control
groups
2. Summing the ranks of the test group and comparing it with an expected result
sum of ranks assuming there is no effect in the test group. The expected
results is based on the number of observations made in both the test and
control group.
3. Looking up in published tables the probability that the expected and observed
results differ by some value.
A.1.9 Chi-Squared test
The Chi-squared (X
2
) test is used to determine if two or more categorical
attributes are independent.
Important: The Wilcoxon test is attractive because it is a non-parametric test.
In other words, it does not require that the distribution of the attribute values
have a specified functional form such as a normal distribution, gamma
distribution, etc. Other tests like the TWO SAMPLE t TEST are parametric
tests which make assumptions that the data is normally distributed. The
absence of these limitations makes the Wilcoxon test attractive.
230 High-Function Business Intelligence in e-business
A categorical attribute has a finite number of possible values. An example of a
categorical attribute could be gender. There are only two possible values, male or
female. Another example is the answer to a survey question where the only
allowed responses are satisfied, neutral, or dissatisfied.
The basic concept here is that if two attributes are independent, then the
expected frequencies factor is defined by the probability equation as follows:
P(a,b) = P(a) * P(b)
where P is the probability, and a and b represent the attributes.
We then see if the actual frequencies approximately satisfy the above rule.
The formula for
X
2
is:
Where:
O is the observed frequency
E is the expected frequency.
Clearly, to use
X
2
one must determine or be given the expected frequency of a
given attribute having a specific outcome.
? In a coin flip example, the expected frequency of a head or tail being flipped is
equal given there are no external forces in play on our coin. Thus the
expected frequency is 50% for heads and 50% for tails.
? Another example of expected frequency where there are only two possible
outcomes but the expected frequencies of either outcome is not equal is
human male/female population categorization for different age ranges.
Younger populations are nearly equal in frequency of males and females,
while older populations tend to have high frequencies of females than males.
Clearly this is due to females living longer than males.
Once we have the expected frequency and have our observed data categorized,
X
2
requires the calculation of our observed frequency for each category.
a. After that the difference between observed and expected frequencies is
squared so that all values are positive.
a. Sum these values and calculate the ratio of this sum against the expected
frequency.
The closer this value is to zero the more likely the attributes are independent. If
X
2
is large there is a high probability the attributes are dependent. Probability
verses
X
2
tables are published in many statistics textbooks.
ChiSquared X
2
()
OE
()
2
E
=
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset