226 High-Function Business Intelligence in e-business
In order to measure whether the sample data is extreme enough to contradict the
null hypothesis, a test statistic is used. This is a random variable with a known
probability distribution that can be used to measure how likely we are to get such
sample data given that the null hypothesis is true. If, given the null hypothesis,
the probability of getting such a value is extremely low, then we would be inclined
to reject the null hypothesis in favor of the alternative hypothesis. On the other
hand, if the probability is not too small, then the null hypothesis might well be true
and we would have to accept it.
Some well known test statistics include the chi-squared statistic, and the
Wilcoxon Rank Sum Test ‘W’ statistic.
Before computing the test statistic, a
significance level needs to be set. This is
our cutoff in terms of what we consider to be a probability that could happen by
chance, and typically is either 5% or 1%. The probability that given the null
hypothesis, you would get sample data this extreme or worse is called the
p-value of the test. Once the p-value is found, it is compared to the significance
level
. If the p-value is larger than the significance level, then you accept the
null hypothesis. If the
p-value is less than the significance level, then you reject
the null hypothesis.
In the first case (PP is equal to ‘a’), the null hypothesis can be proved wrong in
two ways as follows:
1. PP is bigger than ‘a’.
2. PP is smaller than ‘a’.
This kind of test is called a
two-tailed hypothesis test, because in the graph of
the probability distribution of the test statistic, the “tails” to both the left and right
correspond to the rejection of the null hypothesis.
In the second case, where rejection occurs because we think that PP is smaller
than the constant ‘a’ is called a l
eft-tailed hypothesis test.
In the third case, where rejection occurs because we think PP is larger than the
constant ‘a’ is called a
right-tailed hypothesis test.
Collectively the left-tailed and right-tailed hypotheses tests are called
one-tailed
tests
.
A.1.7 HAT diagonal
The HAT diagonal is used in conjunction with linear regression.
As discussed, linear regression involves a best fit of a collection of x,y pairs to a
mathematical equation of the form: