230 High-Function Business Intelligence in e-business
A categorical attribute has a finite number of possible values. An example of a
categorical attribute could be gender. There are only two possible values, male or
female. Another example is the answer to a survey question where the only
allowed responses are satisfied, neutral, or dissatisfied.
The basic concept here is that if two attributes are independent, then the
expected frequencies factor is defined by the probability equation as follows:
P(a,b) = P(a) * P(b)
where ‘P’ is the probability, and ‘a” and ‘b’ represent the attributes.
We then see if the actual frequencies approximately satisfy the above rule.
The formula for
X
2
is:
Where:
O is the observed frequency
E is the expected frequency.
Clearly, to use
X
2
one must determine or be given the expected frequency of a
given attribute having a specific outcome.
? In a coin flip example, the expected frequency of a head or tail being flipped is
equal given there are no external forces in play on our coin. Thus the
expected frequency is 50% for heads and 50% for tails.
? Another example of expected frequency where there are only two possible
outcomes but the expected frequencies of either outcome is not equal is
human male/female population categorization for different age ranges.
Younger populations are nearly equal in frequency of males and females,
while older populations tend to have high frequencies of females than males.
Clearly this is due to females living longer than males.
Once we have the expected frequency and have our observed data categorized,
X
2
requires the calculation of our observed frequency for each category.
a. After that the difference between observed and expected frequencies is
squared so that all values are positive.
a. Sum these values and calculate the ratio of this sum against the expected
frequency.
The closer this value is to zero the more likely the attributes are independent. If
X
2
is large there is a high probability the attributes are dependent. Probability
verses
X
2
tables are published in many statistics textbooks.
ChiSquared X
2
()
OE
–()
2
E
⁄=