CHAPTER
11

The Normal Probability Distribution

In This Chapter

  • Examining the properties of a normal probability distribution
  • Using the standard normal table to calculate probabilities of a normal random variable
  • Using Excel to calculate normal probabilities
  • Using the normal distribution as an approximation to the binomial distribution

Now let’s take on a new challenge: continuous random variables and a continuous probability distribution known as the normal distribution. Remember that in Chapter 8 we defined a continuous random variable as one that can assume any numerical value within an interval as a result of measuring the outcome of an experiment. Some examples of continuous random variables are weight, distance, speed, and time.

The normal distribution is a statistician’s workhorse. This distribution is the foundation for many types of inferential statistics that we rely on today. We will continue to refer to this distribution through many of the remaining chapters in this book.

Characteristics of the Normal Probability Distribution

A continuous random variable that follows the normal probability distribution has several distinctive features. Let’s say that the monthly rainfall in inches for a particular city follows the normal distribution with an average of 3.5 inches and a standard deviation of 0.8 inches. The probability distribution for such a random variable is shown in Figure 11.1.

Figure 11.1

Normal probability distribution with a mean = 3.5, a standard deviation = 0.8.

From this figure, we can make the following observations about the normal distribution:

  • The distribution is bell-shaped and symmetrical around the mean.
  • The mean, median, and mode are the same value–3.5 inches–and in this case fall in the exact center of the distribution.
  • The total area under the curve is equal to 1; half of the area is to the right of the mean and half of it is to the left.
  • The distribution is asymptotic, meaning that the left and right tails of the normal probability distribution extend indefinitely, never quite touching the horizontal axis.

The mean and the standard deviation determine the position and the shape of the normal distribution. The standard deviation plays an important role in the shape of the curve. Looking at Figure 11.1, we can see that nearly all of the monthly rainfall measurements would fall between 1.0 and 6.0 inches. Now look at Figure 11.2, which shows the normal distribution with the same mean of 3.5 inches, but with a standard deviation of only 0.5 inches.

Figure 11.2

Normal probability distribution with a mean = 3.5, a standard deviation = 0.5.

Here you see a curve that’s much tighter around the mean. Almost all the rainfall measurements will be between 2.0 and 5.0 inches per month.

The mean of the normal distribution determines the central point of the distribution. As the mean changes, so does the center point of the distribution. Figure 11.3 shows the impact of changing the mean of the distribution to 5.0 inches, leaving the standard deviation at 0.8 inches.

Figure 11.3

Normal probability distribution with a mean = 5.0, a standard deviation = 0.8.

BOB’S BASICS

A smaller standard deviation results in a “skinnier” curve that’s tighter and taller around the mean. A larger standard deviation makes for a “fatter” curve that’s more spread out and not as tall.

In each of the previous figures, the characteristics of the normal probability distribution hold true. In each case, the values of μ—the mean—and σ—the standard deviation—completely describe the position and shape of the distribution.

The probability function for the normal distribution has a particularly mean personality (that pun was surely intended) and is shown as follows:

I promise you this will be the last you’ll see of this beast. Fortunately, we have other methods for calculating probabilities for this distribution that are more civilized and which we will discuss in the next section.

Calculating Probabilities for the Normal Distribution

There are a couple of approaches to calculate probabilities for a normal random variable: tables and Excel. Let’s start with the table and then move to Excel after that.

To help us calculate the probability for a normal random variable, statisticians introduced a table that we can use instead of the formula above. The problem that they encountered in creating a table is that each normal distribution has its own mean and standard deviation, so they could not come up with just one table to use. They would have to have many tables, each with its own mean and standard deviation. As you might imagine, this would be very tedious. Instead, statisticians said that we can standardize the normal distribution. This way, we can have just one standard normal table with the same mean and standard deviation. The standard normal distribution has a mean (μ) of 0 and standard deviation (σ) of 1. How did they do that? By using the z-score. To demonstrate, let’s use the following example.

One morning a few years ago, Debbie called Bob on his cell phone while he was out running errands and spoke the two words that he had feared hearing. “They’re back,” she said. “Okay,” he replied somberly, and then hung up the phone and headed straight toward the hardware store. His manhood was being challenged, and he’d be darned if he was going to take this lying down. This was war, and he was going home fully prepared for battle! He was referring to his annual struggle with the most vile, the most dastardly, the most hungry creature that God has ever placed on this planet … the Japanese beetle.

By the time Bob returned home from the hardware store, half of his beautiful plum tree looked like Swiss cheese. He quickly counterattacked with a vengeance, spraying the most potent chemicals money could buy. In the end, after the toxic spray cleared, he stood alone, master of his domain.

Alright, let’s say that the amount of toxic spray Bob uses each year follows a normal distribution with a mean of 60 ounces and a standard deviation of 5 ounces. This means that each year Bob battles with these demons, the most likely amount of spray he uses is 60 ounces, but it will vary year to year. The probability of other amounts above and below 60 ounces will drop off according to the bell-shaped curve. Armed with this information, we are now ready to determine probabilities of various usages each year.

Calculating Probability Using the Z-Score

Because the total area under a normal distribution curve equals 1 and the curve is symmetrical, we can say the probability that Bob uses 60 ounces or more of spray is 50 percent, as is the probability that he uses 60 ounces or less. This is shown in Figure 11.4.

Figure 11.4

Normal probability distribution with a mean = 60, a standard deviation = 5.0.

How would you calculate the probability that Bob would use 64.3 ounces of spray or less the following year? I’m glad you asked. For this task, we need to use the standard normal distribution we mentioned above, which is a normal distribution with μ = 0 and σ = 1, and is shown in Figure 11.5.

Figure 11.5

Standard normal probability distribution with a mean = 0, a standard deviation = 1.0.

DEFINITION

The standard normal distribution is a standardized distribution with a mean equal to 0 and a standard deviation equal to 1.0.

This standard normal distribution is the basis for all normal probability calculations, and we’ll use it throughout this chapter. To standardize the normal distribution, we need to calculate the z-score. What is the z-score? Read on!

The z-score is the number of standard deviations any point is away from the mean. So, our next step is to determine how many standard deviations the value 64.3 is from the mean of 60 and show this value on the standard normal distribution curve. We do this using the following formula:

where:

x = the normally distributed random variable of interest

μ = the mean of the normal distribution

σ = the standard deviation of the normal distribution

z = the number of standard deviations between x and μ, otherwise known as the standard z-score

For our example, the standard z-score is as follows:

Now I know that 64.3 is 0.86 standard deviations away from 60 in my distribution.

TEST YOUR KNOWLEDGE

Can the z-score be negative? Yes! The z-score is negative for all x values to the left of the mean (less than the mean), positive for all x values to the right of the mean, and the z-score is 0 when the x value is the same as the mean.

Using the Standard Normal Table

Now that we have the standard z-score, we can use the following table to determine the probability that Bob uses 64.3 ounces of toxic spray or less the next year. This table is an excerpt from Appendix B and shows the area of the standard normal curve up to and including certain values of z. Because z = 0.86 in our example, we go to the 0.8 row and the 0.06 column to find the value 0.8051, which is underlined. This is the probability. Yes, it’s that easy!

Second digit of z

This area is shown graphically in Figure 11.6.

Figure 11.6

The shaded area represents the probability that z will be less than or equal to 0.86.

The probability that the standard z-score will be less than or equal to 0.86 is 80.51 percent. Because:

P(x ≤ 64.3) = P(z ≤ 0.86) = 0.8051

There is an 80.51 percent chance Bob uses 64.3 ounces of spray or less the next year against those evil Japanese beetles. This can be seen in Figure 11.7.

Figure 11.7

The shaded area represents the probability that x will be less than or equal to 64.3 ounces.

What about the probability that Bob uses more than 62.5 ounces of spray the next year? Because the standard normal table only has probabilities that are less than or equal to the z-scores, we need to look at the complement to this event.

P(x > 62.5) = 1 – P(x ≤ 62.5)

The z-score now becomes this:

According to the standard normal table:

P(z ≤ 0.50) = 0.6915

But we want:

P(z > 0.50) = 1 – 0.6915 = 0.3085

This probability is shown graphically in Figure 11.8.

Figure 11.8

The shaded area represents the probability that z will be more than 0.50 ounces.

Because:

P(x > 62.5) = P(z > 0.50) = 0.3085

There is a 30.85 percent chance that Bob uses more than 62.5 ounces of toxic spray. Beetles beware!

What about the probability that Bob uses more than 54 ounces of spray? Again, we need the complement rule, which would be this:

P(x > 54) = 1 – P(x ≤ 54)

The z-score becomes this:

The negative z-score indicates that we are to the left of the distribution mean. Now we are going to look at the other standard normal distribution table, which has the negative values of z. The table below is an excerpt from Appendix B and shows the negative values of z. In our example, z = -1.20, so we go to the -1.2 row and the 0.00 column to find the value 0.1151, which is underlined.

Second digit of z

This is the probability that x ≤ 54. What we need is the probability that x > 54, but we know what to do now! Use the complement rule as follows:

P(x > 54) = P(z > -1.2) = 1 – P(z ≤ -1.2) = 1 – 0.1151 = 0.8849

There is an 88.49 percent chance Bob will use more than 54 ounces of spray. This probability is shown graphically in Figure 11.9.

Figure 11.9

The shaded area is the probability that x will be more than 54 ounces.

Finally, let’s look at the probability that Bob uses between 54 and 62.5 ounces of spray the next year. This probability is shown graphically in Figure 11.10.

Figure 11.10

The shaded area is the probability that x will be between 54 and 62.5 ounces.

We know from previous examples that the area to the left of 54 ounces is 0.1151 and that the area to the right of 62.5 ounces is 0.3085. Because the total area under the curve is 1:

P(54 ≤ x ≤ 62.5) = 1 – 0.1151 – 0.3085 = 0.5764

There is a 57.64 percent chance that Bob uses between 54 and 62.5 ounces of spray the next year.

The Area Under the Normal Distribution and the Empirical Rule

The empirical rule (sounds like a decree from the emperor) tells us the area under the normal distribution. It states that if a distribution follows a bell-shaped, symmetrical curve centered around the mean, then we can expect approximately 68.3, 95.5, and 99.7 percent of the values to fall within 1.0, 2.0, and 3.0 standard deviations around the mean, respectively. I’m glad to inform you that we now have the ability to demonstrate these results.

The shaded area in Figure 11.11 shows the percentage of observations that we would expect to fall within 1.0 standard deviation of the mean.

Figure 11.11

The shaded area is the probability that x will be between -1.0 and +1.0 standard deviation from the mean.

Where did 68.3 percent come from? We can look in the standard normal table to get the probability that an observation will be less than one standard deviation from the mean. The probability of the area on the right side (z = +1) is this:

P(z ≤ +1.0) = 0.8413

And the probability of the area on the left side (z = -1) is this:

P(z < -1.0) = 0.1587

So the area between -1.0 and +1.0 is:

P(-1.0 ≤ z ≤ +1.0) = 0.8413 – 0.1587 = 0.6826

The same logic is used to demonstrate the probabilities of 2.0 and 3.0 standard deviations from the mean. I’ll leave those for you to try.

TEST YOUR KNOWLEDGE

The empirical rule is also known as the 68-95-99.7 percent rule. I’m sure you are not surprised where the name comes from!

Calculating Normal Probabilities Using Excel

Once again we can rely on Excel to do some of the grunt work for us. Excel has a built-in function, NORM.DIST, that can calculate the normal probability for us. It has the following characteristics:

NORM.DIST(x, mean, standard_dev, cumulative)

where:

cumulative = FALSE if you want the probability mass function (we don’t)

cumulative = TRUE if you want the cumulative probability (we do)

For instance, Figure 11.12 shows the NORM.DIST function being used to calculate the probability that Bob uses less than 64.3 ounces of spray on those nasty beetles the next year.

Figure 11.12

NORM.DIST function in Excel for less than 64.3 ounces.

Cell A1 contains the Excel formula =NORM.DIST (64.3,60,5,TRUE) with the result being 0.8051. This probability is underlined in the standard normal table earlier in the chapter.

BOB’S BASICS

Don’t be alarmed if the values that are returned using the NORM.DIST function in Excel are slightly different than those found in Table 3 in Appendix B. This is due to rounding differences that are small enough to be ignored.

Using the Normal Distribution as an Approximation to the Binomial Distribution

Remember how nasty our friend the binomial distribution can get sometimes? Well, the normal distribution may be able help us out during these difficult times under the right conditions. Recall from Chapter 9 that the binomial equation will calculate the probability of x successes in n trials with p = the probability of a success for each trial and q = the probability of a failure. If np ≥ 5 and nq ≥ 5, then we can use the normal distribution to approximate the binomial.

As an example, suppose my statistics class is composed of 60 percent females. If I select 15 students at random, what is the probability that this group will include 8, 9, 10, or 11 female students? For this example, n = 15; p = 0.6; q = 0.4; and x = 8, 9, 10, and 11. We can use the normal approximation because np = (15)(0.6) = 9 and nq = (15)(0.4) = 6. (Sorry, guys. I didn’t mean to infer picking you would be classified a failure!)

Also recall from Chapter 9 that the mean and standard deviation of this binomial distribution is this:

The probability that the group of 15 students will include 8, 9, 10, or 11 female students can be calculated using the following equations:

Now let’s solve this problem using the normal distribution and compare the results. Figure 11.13 shows the normal distribution with μ = 9 and σ = 1.897.

Figure 11.13

The normal approximation to the binomial distribution.

Notice that the shaded interval goes from 7.5 to 11.5 rather than 8 to 11. Don’t worry; I didn’t make a mistake. I subtracted 0.5 from 8 and added 0.5 to 11 to compensate for the fact that the normal distribution is continuous and the binomial is discrete. Adding and subtracting 0.5 is known as the continuity correction factor. For larger values of n, like 100 or more, you can ignore this correction factor.

Now we need to calculate the z-scores.

According to the standard normal table:

P(z ≤ +1.32) = .9066

And P(z ≤ -0.79) = 0.2148

The probability of interest for this example is the area between z-scores of -0.79 and +1.32. We can use the following calculations to find this area:

P(-.079 ≤ z ≤ +1.32) = 0.9066 – 0.2148 = 0.6918

This probability is shown in the shaded area in Figure 11.14.

Figure 11.14

The probability that -0.79 ≤ z ≤ +1.32 standard deviations from the mean.

Using the normal distribution, we have determined the probability that my group of 15 students will contain 8, 9, 10, or 11 females is 0.6918. As you can see, this probability is very close to the result we obtained using the binomial equations, which was 0.6899.

Well, this ends our chapter on the normal probability distribution. Now we are prepared to dig deeper into inferential statistics!

Practice Problems

1. The speed of cars passing through a checkpoint follows a normal distribution with μ = 62.6 miles per hour and σ = 3.7 miles per hour. What is the probability that the next car passing will …

   a. Be exceeding 65.5 miles per hour?

   b. Be exceeding 58.1 miles per hour?

   c. Be between 61 and 70 miles per hour?

2. The selling price of various homes in a community follows the normal distribution with μ = $176,000 and σ = $22,300. What is the probability that the next house will sell for …

   a. Less than $190,000?

   b. Less than $158,000?

   c. Between $150,000 and $168,000?

3. The age of customers for a particular retail store follows a normal distribution with μ = 37.5 years and σ = 7.6 years. What is the probability that the next customer who enters the store will be …

   a. More than 31 years old?

   b. Less than 42 years old?

   c. Between 40 and 45 years old?

4. A coin is flipped 14 times. Use the normal approximation to the binomial distribution to calculate the probability of a total of 4, 5, or 6 heads. Compare this to the binomial probability.

5. A certain statistics author’s golf scores follow the normal distribution with a mean of 92 and a standard deviation of 4. What is the probability that, during his next round of golf, his score will be …

   a. More than 97?

   b. More than 90?

6. The number of text messages that Debbie’s son Jeff sends and receives a month follows the normal distribution with a mean of 4,580 (I am not making this up!) and a standard deviation of 550. What is the probability that next month he will send and receive …

   a. Between 4,000 and 5,000 text messages?

   b. Less than 4,200 text messages?

7. A data set that follows a bell-shape and symmetrical distribution has a mean equal to 75 and a standard deviation equal to 10. What range of values centered around the mean would represent 95.5 percent of the data points?

The Least You Need to Know

  • The normal distribution is bell-shaped and symmetrical around the mean.
  • The total area under the normal distribution curve is equal to 1.0.
  • The normal distribution tables are based on the standard normal distribution where μ = 0 and σ = 1.
  • The number of standard deviations between a normally distributed random variable, x, and μ is known as the standard z-score and can be found with
  • The empirical rule states that if a distribution follows a bell-shape, a symmetrical curve centered around the mean, then we can expect approximately 68.3, 95.5, and 99.7 percent of the values to fall within one, two, and three standard deviations around the mean, respectively.
  • Excel has a built-in function, NORM.DIST, that you can use to perform normal distribution calculations.
  • You can use the normal distribution to approximate the binomial distribution when np ≥ 5 and nq ≥ 5.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset