Bayesian versus Frequentist

The preceding example was almost too easy. In practice, we can hardly ever truly count the number of ways something can happen. For example, let's say that we want to know the probability of a random person smoking cigarettes at least once a day. If we wanted to approach this problem using the classical way (the previous formula), we would need to figure out how many different ways a person is a smoker—someone who smokes at least once a day—which is not possible!

When faced with such a problem, two main schools of thought are considered when it comes to calculating probabilities in practice: the Frequentist approach and the Bayesian approach. This chapter will focus heavily on the Frequentist approach, while the subsequent chapter will dive into the Bayesian approach.

Frequentist approach

In a Frequentist approach, the probability of an event is calculated through experimentation. It uses the past in order to predict the future chance of an event. The basic formula is as follows:

Frequentist approach

Basically, we observe several instances of the event and count the number of times A was satisfied. The division of these numbers is an approximation of the probability.

The Bayesian approach differs by dictating that probabilities must be discerned using theoretical means. Using the Bayes approach, we would have to think a bit more critically about events and why they occur. Neither methodology is wholly the correct answer all of the time. Usually, it comes down to the problem and the difficulty of using either approach.

The crux of the Frequentist approach is the relative frequency.

The relative frequency of an event is how often an event occurs divided by the total number of observations.

Example – marketing stats

Let's say that you are interested in ascertaining how often a person who visits your website is likely to return on a later date. This is sometimes called the rate of repeat visitors. In the previous definition, we would define our A event as being a visitor coming back to the site. We would then have to calculate the number of ways a person can come back, which doesn't really make sense at all! In this case, many people would turn to a Bayesian approach; however, we can calculate what is known as relative frequency.

So, in this case, we can take the visitor logs and calculate the relative frequency of event A (repeat visitors). Let's say, of the 1,458 unique visitors in the past week, 452 were repeat visitors. We can calculate this as follows:

Frequentist approach

So, about 31% of your visitors are repeat visitors.

The law of large numbers

The reason that even the Frequentist approach can do this is that of the law of large numbers, which states that if we repeat a procedure over and over, the relative frequency probability will approach the actual probability. Let's try to demonstrate this using Python.

If I were to ask you the average of the numbers 1 and 10, you would very quickly answer around 5. This question is identical to asking you to pick the average number between 1 and 10.

Python will choose n random numbers between 1 and 10 and find their average.

We will repeat this experiment several times using a larger n each time, and then we will graph the outcome. The steps are as follows:

  1. Pick a random number between 1 and 10 and find the average
  2. Pick two random numbers between 1 and 10 and find their average
  3. Pick three random numbers between 1 and 10 and find their average
  4. Pick 10,000 random numbers between 1 and 10 and find their average
  5. Graph the results

Let's take a look at the code:

import numpy as np 
import pandas as pd 
from matplotlib import pyplot as plt 
%matplotlib inline 
results = [] 
for n in range(1,10000): 
    nums = np.random.randint(low=1,high=10, size=n) # choose n numbers between 1 and 10 
    mean = nums.mean()                              # find the average of these numbers 
    results.append(mean)                            # add the average to a running list 
     
# POP QUIZ: How large is the list results? 
len(results) # 9999 
# This was tricky because I took the range from 1 to 10000 and usually we do from 0 to 10000 
df = pd.DataFrame({ 'means' : results}) 
print (df.head()) # the averages in the beginning are all over the place! 
# means 
# 9.0 
# 5.0 
# 6.0 
# 4.5 
# 4.0 
print (df.tail()) # as n, our size of the sample size, increases, the averages get closer to 5! 
# means 
# 4.998799 
# 5.060924 
# 4.990597 
# 5.008802 
# 4.979198 
df.plot(title='Law of Large Numbers') 
plt.xlabel("Number of throws in sample") 
plt.ylabel("Average Of Sample") 
The law of large numbers

Cool, right? What this is essentially showing us is that as we increase the sample size of our relative frequency, the frequency approaches the actual average (probability) of 5.

In our statistics chapters, we will work to define this law much more rigorously, but for now, just know that it is used to link the relative frequency of an event to its actual probability.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset