Chapter 5. Hypothesis Testing

Anyone can make claims and many actually do. The joy of data analysis lies partly in the ability to claim that a condition such as an event or a property has a measurable impact and defend that claim with a proper analysis. A claim that is testable is called a hypothesis. This chapter looks at the concept of hypothesis testing. In hypothesis testing, we examine a condition to determine whether it has a measurable impact on data or not. Since a hypothesis is only limited by one's imagination, many hypotheses will be determined as claims that do not have a measurable impact on the data. This doesn't mean that a claim isn't true. Sometimes, it just isn't measurable. For this reason, we stress that you should be skeptical about a hypothesis (which includes even your own hypothesis). Be prepared to have all of your ideas stated in a testable form and then perform the test.

Data in a coin

If you have a coin, you have money. If you flip that coin once and record the side that faces up, you have a result. If you flip that coin a second time and record the result, you have data.

Imagine that I have in my possession a fair coin. By the term fair coin, I mean that the coin has a side for heads, a side for tails, and (when flipped) the coin will land on the ground with a 50-50 chance of either heads or tails being the side that faces up (other possibilities, such as the coin landing on its edge, are not considered). You are allowed to inspect the coin and then hand it back to me. With the coin back in my possession, I cast a magic spell. I say a few magic words and tell you that the coin is now different. The coin is now in your hands and you are allowed to inspect it again. At a glance, the coin appears to be exactly the same as it was before I cast the spell. Did my spell have any magical effect on the coin? How will you be able to tell?

Hypothesis test

You will instantly realize that this is going to require a test. Our test is divided into two equal, yet opposite, outcomes:

  • The magic spell did not have a measurable impact on the coin
  • The magic spell did have a measurable impact on the coin

We will call the first outcome a null hypothesis, which is considered to be the default position. In plain English, a null hypothesis is the result that conveys that there was no change as a result of our action. I took the coin and said a few magic words, but whatever it is that I intended to do with this spell did not have a measurable impact on the coin. The second outcome is called an alternative hypothesis, which means that something did happen. My magic spell may have impacted upon the coin. There's also a possibility that I quickly shaved off a tiny edge of the coin when no one was looking to make the coin land on one side more often than the other. Our test will be able to indicate that a change was detected in the coin, but it will never be able to tell you the cause. Without performing any testing, our intuition should lean towards the idea that nothing happened and our skepticism should be rooted in the idea that something happened. Hence, the focus of our test will be the evaluation of the alternative hypothesis.

If we are not able to detect a change in our coin, we say that we failed to reject the null hypothesis. In other words, our test should attempt to reject the idea that no changes were made to the coin (otherwise, there will be no point in testing). If we are not able to detect a change, we fail in that rejection. There is an important distinction here that we need to make regarding the two cases where we may either fail to reject or accept the null hypothesis. When we are not able to detect a change that may have been made to the coin, it means that this particular test was unable to find a change that may have been made to the coin. There is a possibility that a future test can detect how the coin was changed. By stating that we fail to reject the null hypothesis, we leave open the possibility that a change happened, but we were not able to find it.

Establishing the magic coin test

"I know! I will flip the coin 1,000 times and count the number of times I see the coin land with heads face up."

"Good. What do you expect the number of heads to be?"

"Oh, I suppose it should be around 500."

This is the application of the exception formula. In the following formula, X is called a random variable. It is an event that will transpire with either a result of 1 (a coin lands on heads) or 0 (when a coin lands on tails or on the edge) based on the probability p. N represents the total number of event outcomes (which, in this case, is 2). The probability pi represents an event's probability and Xi represents that event's value. Since there are two events, the first event has a probability of 0.5 and a value of 1, and the second event has a probability of 0.5 and a value of 0.

Establishing the magic coin test

We are going to treat heads in our example as an outcome that has a value of 1 and all other outcomes have a value of 0. The probability of heads is 0.5 and tails is (1 - 0.5), which is also 0.5:

Establishing the magic coin test

Simply multiply 0.5 by 1,000 to get the expected number of 500 heads. However, you may realize that the true number might not be exactly 500 heads, but it should be close.

Understanding data variance

"We will set a window value. If the mean of the coin flips is within 500 plus or minus this window value, we will not be able to claim that the spell had no effect on the coin."

Sounds good. However, what should we set our window value to? We need to know how much our random variable spreads out over time. If we set the window value to 0, only an experiment with exactly 500 heads will pass. That might be too strict given that even a perfectly fair coin will not produce this value every time. If we set a window value of 50 (meaning plus or minus 50 heads), most people will agree that it is within the natural spread of a fair coin. If we set a window value of 100, we might still have a high number of people who believe that our coin was fair, but it should be fewer than when we set our window value to 50. If we set the window value to 300 (this means that we can have the total number of heads as low as 200 and as high as 800), it will be reasonable to conclude that this test is not useful to us. We need to understand how this variable spreads naturally. An understanding of this spread is called variance.

However, how do we evaluate this thought experiment without data? We need to study the problem itself. A Bernoulli trial is a single experiment (in this case, a coin flip) that results in a success (heads) or a failure (all other outcomes). When the trial is repeated, say, 5 times, we get sequences of heads and tails, such as HHTHT or THHHT. We can convert these sequences into values—11010 and 01110. Each sequence will have a computable total number of successes (in our small example, both trials resulted in a total of three successes) and these sets of experiments can be plotted into a histogram. This small example has 6 possible totals from 0 successes to 5 successes.

Probability mass function

We can illustrate this experiment by computing the average of all the possible sequences of 1,000 coin flips (of which there are Probability mass function combinations (type 2 ^ 1000 into GHCi to see the size of this number). There is a better way. When many Bernoulli trials happen, the collection of these trials forms a binomial distribution. The following is a convenient formula to recreate the binomial distribution of data called the probabilityMassFunction. The plotting of the probabilityMassFunction will produce the same plot as a histogram of all the possible outcomes of coin flips. The formula for this can be denoted as:

Probability mass function

This formula can be framed as a sentence in the following way; an event that succeeds with the probability p will succeed exactly k times out of n trials out of all possible outcomes with a probability determined by multiplying the probability of at least k desired successes by the probability of at least (n-k) desired failures by all of the possible arrangements that k number of successes can be arranged in n trials.

We can create a histogram of coin flips using Haskell. To make our lives easier, we are going to install the Combinatorics package. This packages contains the function that allows us to quickly perform the calculation required to determine the number of possible arrangements of k successes out of n trials:

$ cabal install exact-combinatorics

At the top of our LearningDataAnalysis05.hs file, make sure that our package is imported:

import Math.Combinatorics.Binomial

Next, let's craft the function:

probabilityMassFunction ::
    Integral a => a -> a -> Double -> Double
probabilityMassFunction k n p =
    (fromIntegral (n 'choose' k))
      * (p^k) * ((1-p)^(n-k))

We will call the plot function (introduced in Chapter 4, Plotting) using the new probabilityMassFunction function to create a graph with a range of 0 to 1,000. To gain access to the LearningDataAnalysis04 and LearningDataAnalysis02 modules (used later in this chapter), use the following GHCi command:

> :l LearningDataAnalysis02 LearningDataAnalysis04 LearningDataAnalysis05

> :m LearningDataAnalysis02 LearningDataAnalysis04 LearningDataAnalysis05

Now, we will plot the function, as follows:

> import Graphics.EasyPlot> plot (PNG "coinflips.png") $ Function2D [Title "Coin Flip Probabilities"] [Range 0 1000] (k -> probabilityMassFunction (floor k) 1000 0.5)

The following screenshot shows the result of the preceding command:

Probability mass function

Note that the data has a sharp peak at the 500 mark. It can be seen that the majority of the peak's width extends somewhere between 400 and 600. The portions of the graph that lie before and after the peak indicate that the probability of occurrence is almost 0. It is not impossible, but it is just highly unlikely. Let's find out the outcome of the most likely event at the peak of this plot:

> probabilityMassFunction 500 1000 0.5
2.52250181783608e-2

The mostly likely event from all the possible coin flips will happen 2.5 percent of the time. While this may seem small, it is still larger than the remaining 1,000 possible totals. While we are at it, take a note of the entire plot. The plot represents the probability of every individual possible outcome, no matter how insignificant. If we were to add up the probability of every possible outcome, the sum should be exactly 1. We will demonstrate this with Haskell:

> sum $ map (k -> probabilityMassFunction k 1000 0.5) [0..1000]
1.0

The probability that any event will happen is equal to the sum of the probabilities of each event outcome, and this probability is always equal to 1.

Determining our test interval

We wish to create a test that will allow us to determine whether the results of the 1,000 enchanted coin flips fall within 99 percent of the possible fair coin outcomes. I picked 99 percent because the value needs to be convincingly high to show that the enchanted coin is different to a fair coin. This section of the process of testing is arbitrary. We need to know the range of events that result in the sum of all probabilities that equal 0.99 that reside at the center of the problem's mass function plot.

With a bit of trial and error, we will see that a window size of 40 hits our mark, which is close to the 99 percent threshold that we selected. I manually tested the ranges until I came across a range that is close to 0.99. It probably took about four tries to find a result that I liked:

> sum $ map (k -> probabilityMassFunction k 1000 0.5) [(500-40)..(500+40)]
0.9896118684338442

Establishing the parameters of the experiment

Finally, we can establish the parameters of our experiment. The null hypothesis is that the enchanted coin will flip heads 500 times (plus or minus 40). The alternative hypothesis is that the enchanted coin will flip heads more than 540 times or fewer than 460 times. Once we perform the experiment and gather the results, we will accept the alternative hypothesis if the result is greater than 540 or fewer than 460. Otherwise, we fail to reject the null hypothesis.

Introducing System.Random

We will now begin our actual experiment. To do this, we will not put any magical spells over real coins. We will use Haskell. Though Haskell is a purely functional programming language, it does have the ability to extend outside itself to look for resources, including the ones related to random number generation.

To use the System.Random module, you will first need to create a new random number generator, as follows:

> import System.Random
> g <- newStdGen

Using this new random number generator, we can utilize the random function to return pseudorandom values (and a new generator). The random number generation functions can output any numeric type we desire, so I have explicitly stated that I want a Double type. Note that if you make this call again a second time, you will get the same result thanks to Haskell's side effect-free nature. Here are the results that I get when I make a call to random:

> random g :: (Double, StdGen)
(0.8872828052781477,805351557 696985193)

We can use the random function to generate infinite random numbers in the range of 0 to 1. Since you probably do not need an infinite amount of random numbers, use the take function to limit this to your needs. Note that the first number in the list is identical to the previously generated random number. This is not a fluke. I didn't change my random number generation variable between these two calls. Here, we will generate three random values in the range of 0 to 1:

> take 3 $ randoms g :: [Double]
[0.8872828052781477,0.6612757244159314,0.7335027565852938]

We can also generate random values that are integers ranging from two specified values using the randomRs function. Again, you will need to use the take function to limit this to your needs. Here, we generate 5 random values in the range of 0 to 100:

> take 5 $ randomRs (0,100) g
[62,55,29,69,20]

Performing the experiment

Since our experiment calls for the flipping of a coin 1,000 times (where tails is equal to 0 and heads is equal to 1), we will use the randomRs function to generate 1,000 values on the integer range of 0 to 1 (the speaking of any magical words while issuing this line is optional):

> let coinflips = take 1000 $ randomRs (0, 1) g

What is the final result of this experiment? We will compute the sum of the coinflips value (your result may vary), as follows:

> sum coinflips 
492

Since my result is within the range of 460 to 540, the experiment failed to reject the null hypothesis. There might have been some actual magic in my magical spell, but our experiment is not able to detect any change, and hence, the possibility is left open. If the experiment accepted the alternative hypothesis, it may mean that there is something unique to your system that justifies the result. It may also mean that you should re-run your experiment to be sure.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset