Wilcoxon rank sum test

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Previous Chapter

HAT diagonal

Next Chapter

Interpolation

Appendix A. Introduction to statistics and analytic concepts 229

A.1.8 Wilcoxon rank sum test

The Wilcoxon rank sum test is used to test the assumption that outcomes for a

variable (or a set of variables) under differing conditions are related. That is, the

outcome under the test and control conditions are purely random and the two

conditions have no effect on the outcome.

The basic concept here is that if there is no difference between attribute values

for two populations, then ranks from one population should not be systematically

higher or lower than those of the other. The distribution of ranks for a population

is known under null hypothesis. If ‘W’ is so large that the probability of ‘W’ being

greater than or equal to the computed statistic is small, then reject the null

hypothesis.

A potential use of this test is determining the success or failure of a marketing

campaign in a particular demographic area by comparing the sales results of a

campaign test area with that of another area where no campaign was in effect.

The test methodology involves:

1. Determining an overall ranking to all the outcomes in both test and control

groups

2. Summing the ranks of the test group and comparing it with an expected result

sum of ranks assuming there is no effect in the test group. The expected

results is based on the number of observations made in both the test and

control group.

3. Looking up in published tables the probability that the expected and observed

results differ by some value.

A.1.9 Chi-Squared test

The Chi-squared (X

) test is used to determine if two or more “categorical”

attributes are independent.

Important: The Wilcoxon test is attractive because it is a non-parametric test.

In other words, it does not require that the distribution of the attribute values

have a specified functional form such as a normal distribution, gamma

distribution, etc. Other tests like the TWO SAMPLE t TEST are parametric

tests which make assumptions that the data is normally distributed. The

absence of these limitations makes the Wilcoxon test attractive.

230 High-Function Business Intelligence in e-business

A categorical attribute has a finite number of possible values. An example of a

categorical attribute could be gender. There are only two possible values, male or

female. Another example is the answer to a survey question where the only

allowed responses are satisfied, neutral, or dissatisfied.

The basic concept here is that if two attributes are independent, then the

expected frequencies factor is defined by the probability equation as follows:

P(a,b) = P(a) * P(b)

where ‘P’ is the probability, and ‘a” and ‘b’ represent the attributes.

We then see if the actual frequencies approximately satisfy the above rule.

The formula for

is:

Where:

O is the observed frequency

E is the expected frequency.

Clearly, to use

one must determine or be given the expected frequency of a

given attribute having a specific outcome.

? In a coin flip example, the expected frequency of a head or tail being flipped is

equal given there are no external forces in play on our coin. Thus the

expected frequency is 50% for heads and 50% for tails.

? Another example of expected frequency where there are only two possible

outcomes but the expected frequencies of either outcome is not equal is

human male/female population categorization for different age ranges.

Younger populations are nearly equal in frequency of males and females,

while older populations tend to have high frequencies of females than males.

Clearly this is due to females living longer than males.

Once we have the expected frequency and have our observed data categorized,

requires the calculation of our observed frequency for each category.

a. After that the difference between observed and expected frequencies is

squared so that all values are positive.

a. Sum these values and calculate the ratio of this sum against the expected

frequency.

The closer this value is to zero the more likely the attributes are independent. If

is large there is a high probability the attributes are dependent. Probability

verses

tables are published in many statistics textbooks.

ChiSquared X

()

–()

⁄=

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Wilcoxon rank sum test

Create new playlist

Sign In

Sign Up

Table of Contents for
Wilcoxon rank sum test