CHAPTER
13

Sampling Distributions

In This Chapter

  • Using sampling distributions of the mean and proportion
  • Working with the central limit theorem
  • Using the standard error of the sampling mean and proportion

In Chapter 12, we praised the wonders of using samples in our statistical analysis because it was more efficient than measuring an entire population. In this chapter, we’ll discover another benefit of using samples–sampling distributions.

Sampling distributions describe how sample averages behave. You may be surprised to hear they behave very well–even better than the populations from which they are drawn. Good behavior means we can do a pretty good job at predicting future values of sample means with a little bit of information. This might sound a little puzzling now, but by the end of this chapter you’ll be shaking your head in utter amazement.

What Is a Sampling Distribution?

The sampling distribution is a table with two columns: each of the sample statistics (such as the sample means) in one column and the corresponding probabilities in the other column. It is very similar to the probability distribution we talked about in Chapter 8, with the difference being the variable. In this case, we record the sample statistic instead of the values of the random variable, like before.

For example, let’s say I want to perform a study to determine the number of miles the average person drives a car in one day. Because it’s not possible to measure the driving patterns of every person in the population, I randomly choose a sample size of 10 (n = 10) qualified individuals and record how many miles they drove yesterday. I then choose another 10 drivers and record the same information. I do this three more times, with the results in the following table.

Sample Number

Average Number of Miles (Sample Mean)

1

40.4

2

76.0

3

58.9

4

43.6

5

62.6

As you can see, each sample has its own mean value, and each value is different. We can continue this experiment by selecting many more samples and observing the pattern of sample means. This pattern of sample means represents the sampling distribution for the number of miles the average person drives in one day.

Sampling Distribution of the Sample Means

The distribution from the previous example represents the sampling distribution of the sample means because the mean of each sample was the measurement of interest. Why do we care about the sampling distribution? Good question–we care because it has interesting properties that help us in inferential statistics. If we sample with replacement from a population and we get all possible samples of a given size (Do you remember counting rules? This is the method we would use to get all the possible samples of certain size from a population.), then the sampling distribution will have these cool features:

  • The mean of this sampling distribution will have the exact mean as the population. Amazing, isn’t it? But wait to see the next one.
  • The sampling distribution of the sample mean will have a smaller standard deviation than the population from which the samples were drawn. In other words, the sampling distribution is narrower than the distribution for the population. Moreover, the standard deviation of this sampling distribution equals the population standard deviation divided by the square root of the sample size, n.

Let’s explore these features in detail and use an example to clarify. Just for simplicity and to make the calculations easier, let’s say I’ve a small class of five students with the following grades on a statistics exam (so my entire population consists of these 5 grades): 90       88       86       84       82. Now I want to get all possible samples (of say 2 students) with replacement from this class. To do that, we use the fundamental counting rule from Chapter 7, and we get 25 samples. The following table lists all 25 sample combinations of 2 students and the corresponding sample mean of each one.

Sample

Sample Mean ()

90

90

90

90

88

89

90

86

88

90

84

87

90

82

86

88

90

89

88

88

88

88

86

87

88

84

86

88

82

85

86

90

88

86

88

87

86

86

86

86

84

85

86

82

84

84

90

87

84

88

86

84

86

85

84

84

84

84

82

83

82

90

86

82

88

85

82

86

84

82

84

83

82

82

82

Let’s check those two cool features:

First we’ll calculate the means. The population mean =

= 86. The mean of all the sample means in the last column of the table is

= 86, also!

If you’re not surprised yet, look at this second feature! Recall the equation for finding the population standard deviation: . Using this, the standard deviation of the population = 2.83.

The standard deviation of all the sample means in the last column, using , is 2.00.

Look at this: divide the standard deviation of the population by the and you get = 2.00, which we know is also the standard deviation of the sample means! Ta da! Please hold your applause until the end of the book!

Now you know why we statisticians are very fond of sampling distributions!

DEFINITION

The sampling distribution of the sample means is a table with two columns: the sample means in one column and the probability of each sample mean in another column.

Mean of the Sampling Distribution of the Sample Means

Using our previous example, we can get the sampling distribution of the sample means by listing all the sample means in one column and the probability of each one in the second column, as follows:

Sampling Distribution of the Sample Mean (n = 2)

Sample Mean ()

Frequency

Probability (Relative Frequency)

82

1

0.04

83

2

0.08

84

3

0.12

85

4

0.16

86

5

0.20

87

4

0.16

88

3

0.12

89

2

0.08

90

1

0.04

We can also display this sampling distribution graphically, shown in Figure 13.1.

Figure 13.1

Sample distribution of the sample means for n = 2.

WRONG NUMBER

Students often confuse sample size, n, and number of samples. In the previous example, the sample size equals 2 (n = 2), and the number of samples equals 25. In other words, we have 25 samples, each whose sample size is 2.

In the previous example we proved that the mean of the sampling distribution is the same as the population mean. But if you still doubt it, let’s prove it another way. Applying the same formula for the mean of the probability distribution (which we learned in Chapter 8) to this sampling distribution gives us the following:

where:

= the mean of the sampling distribution

= the sample mean

P() = the probability of the sample mean

N = number of samples

The table below shows the calculation for the mean using the previous formula:

Sample Mean (i)

P(i)

iP(i)

82

0.04

3.28

83

0.08

6.64

84

0.12

10.08

85

0.16

13.6

86

0.20

17.2

87

0.16

13.92

88

0.12

10.56

89

0.08

7.12

90

0.04

3.6

= 3.28 + 6.64 + 10.08 + 13.6 + 17.2 + 13.92 + 10.56 + 7.12 + 3.6 = 86

As you can see your answer is again 86. Now you’ll always believe me! The mean of the sampling distribution of the sample means is the same as the population mean.

Standard Error of the Sampling Distribution of the Sample Means

In the previous example, we saw that the standard deviation of the sampling distribution equals . Now to prove it to you in a different way, let’s apply to this example the variance and standard deviation formulas for the probability distribution that we learned in Chapter 8:

where:

= the standard deviation of the sampling distribution of the sample means

The following table shows the calculation for the standard deviation using the previous formula:

Sample Mean (i)

P(i)

(i − μ)2 P (i)

82

0.04

0.64

83

0.08

0.72

84

0.12

0.48

85

0.16

0.16

86

0.20

0

87

0.16

0.16

88

0.12

0.48

89

0.08

0.72

90

0.04

0.64

Just like the mean, the answer is the same as the one we obtained in the previous section. Therefore, we can calculate the standard deviation of the sampling distribution of the sample means as follows:

where:

= the standard deviation of the sampling distribution of the sample means

σ = the standard deviation of the population

n = sample size

Just to throw one more term at you, the standard deviation of the sampling distribution (what we just calculated) is also known as the standard error of the mean.

DEFINITION

The standard error of the mean is the standard deviation of the sampling distribution of the sample means and can be determined by .

BOB’S BASICS

Students often confuse σ and . The symbol σ, the standard deviation of the population, measures the variation within the population and was discussed in Chapter 5. The symbol , the standard error, measures the variation of the sample means and will decrease as the sample size increases.

I’m sure by now your highly inquisitive mind is screaming, “What happens to the sampling distribution if we increase the sample size?” That’s an excellent question, which we will address in the next section.

The Central Limit Theorem

As we mentioned earlier, sample means behave in a very special way. According to the central limit theorem, as the sample size, n, gets larger, the sample means tend to follow a normal probability distribution. This holds true regardless of the distribution of the population from which the sample was drawn. Amazing, you say.

DEFINITION

According to the central limit theorem, as the sample size, n, gets larger, the sampling distribution of the sample means tends to follow a normal probability distribution with a mean equal to the true population mean, μ, and standard error . This holds true regardless of the distribution of the population from which the sample was drawn.

The central limit theorem is very important in statistics, and as Bob called it, it’s “the mother of all theorems.” The central limit theorem assures us that if we sample large enough (n ≥ 30), then the sampling distribution will be normally distributed regardless of the distribution of the population itself. If the population from which the samples were drawn is not normal or if we simply don’t know whether the population is normal or not, then the central limit theorem will hold as long as we have a large sample of 30 or more. Note, however, that if the population from which the samples were drawn is normally distributed, then the sampling distribution taken from it is also normally distributed without the need for the central limit theorem (in other words, it would work for any sample size).

Remember how we said the sampling distribution behaves very well? To show you an example of its good behavior, compare the probability distribution of the population to the probability distribution of the sampling distribution of the sample means. Since and > 0, then < σ. You say, “So what?” Well, it means that the sampling distribution is less dispersed around the mean (in other words, closer to the mean) than the distribution for the population from which the samples were drawn. That’s really good behavior, isn’t it? This is clear in Figures 13.2 and 13.3. Figure 13.2 shows the normal distribution for the population in our example, whereas Figure 13.3 shows the normal distribution for the sampling distribution for our example. You can see the sampling distribution is skinnier than the population distribution! Moreover, as the sample size (n) increases, gets smaller so the sampling distribution gets even closer to the mean–it’s getting skinnier!

Figure 13.2

The normal distribution for the population in the grade example.

Figure 13.3

The normal distribution for the sampling distribution in the grade example.

Putting the Central Limit Theorem to Work

I can just sense your need right now to do something really neat with this wonderful new tool. Look no further. If we know the sampling distribution of the sample means follows the normal probability distribution and we also know the mean and standard deviation of that distribution, we can predict the likelihood that the sample means will be greater or less than certain values.

To clarify, let’s look at an example. According to MyFICO.com, the average FICO score in the United States in October 2012 was 689. Because I don’t know if FICO scores are normally distributed or not, I can’t calculate the probability that the FICO score for any single individual is less than any specific value, say 695. Using the central limit theorem, however, I can calculate the probability that a randomly selected sample of say 30 individuals will have an average FICO score of less than 695. How can I do that? With my large sample, the central limit theorem assures me that the sampling distribution of the sample mean is normally distributed with:

μ = 689

Now, we need to calculate . Let’s assume that the standard deviation for the FICO score is 15, then:

Knowing that the sampling distribution of the sample means follows a normal distribution, we can use the standard normal distribution table to calculate the probability. As we did in Chapter 11, we need to calculate the z-score. The equation looks slightly different because we are working with sample means, but in reality, it is identical to what we saw in Chapter 11. We just replace x with and use its mean and standard deviation instead, as follows:

Using the standard normal z-table in Appendix B:

This probability is shown in Figure 13.4.

Figure 13.4

Probability that the sample mean for FICO score is less than 695.

According to the shaded area, the probability that the sample mean for the FICO score is less than 695 is 98.6 percent.

As you can see, the power of the central limit theorem lies in the fact that you need little information about the distribution of the population to apply it. The sample means will behave very nicely as long as the sample size is large enough. It’s a very versatile theorem that has countless applications in the real world. I knew you’d be impressed!

BOB’S BASICS

The central limit theorem is one of the most powerful concepts for inferential statistics. It forms the foundation for many statistical models that are used today, so it’s a good idea to cozy up to this theorem.

Sampling Distribution of the Proportion

The sample mean is not the only statistical measurement that is performed. What if I want to measure the sample proportion instead? For example, I want to know the percentage of working people who are satisfied with their job. So I collect a random sample of 300 workers and ask them whether they are satisfied with their job or not. Because each respondent has only two choices (satisfied or unsatisfied), this experiment follows the binomial probability distribution, which we discussed in Chapter 9. We also saw that we can use the normal distribution to approximate the binomial distribution and that’s what we are going to do here. So let’s look at it step by step.

Calculating the Sample Proportion

My measurement of interest is the proportion of workers in my sample of size n, who are satisfied with their job. The sample proportion, , is calculated by:

where:

x: number of successes

n: sample size

Note that the sample proportion is denoted by , while the population proportion is denoted by p. According to the Conference Board, in 2013 47.7 percent of workers were satisfied with their job. Now, p = 0.477.

As you recall from Chapter 11, we can use the normal probability distribution to approximate the binomial distribution if np ≥ 5 and nq ≥ 5 (q = 1 – p, the probability of a failure). I can check both conditions as follows:

np = (300)(0.477) = 143.10

nq = (300)(1 – 0.477) = 156.90

WRONG NUMBER

It’s important to remember that a proportion, either p or , cannot be less than 0 or greater than 1. A common mistake that students make is when told that the proportion equals 10 percent, they set p = 10 rather than p = 0.10.

Just like the sample mean, I can get the sampling distribution of the sample proportions by collecting different samples of the same size and getting the proportion of each sample. This sampling distribution of the proportion, just like the sampling distribution of the sample means, will follow the normal distribution and the average for these sample proportions, yes you guessed it, will be the same as the population proportion, p.

Calculating the Standard Error of the Proportion

I now need to calculate the standard deviation of this sampling distribution, which is known as the standard error of the proportion, or σp, with the following equation:

where

p: the population proportion

n: sample size

In our example,

DEFINITION

The standard error of the proportion is the standard deviation of the sample proportions and can be calculated by .

We’re now ready to use this useful information to answer questions like, “What is the probability that 130 workers or more in my sample are satisfied with their job?”

To calculate the probability, we need to calculate the z-value for the proportion using the following equation:

TEST YOUR KNOWLEDGE

Do you see any resemblance between this z-value formula and the one for the sample mean? Yes, you are right, it’s basically the same. We replace with and then use its mean and standard deviation.

In our example, , so we can calculate the z-value as follows:

Using the standard normal z-table in Appendix B:

P( ≥ 0.433) = 1 – P(z ≤ -1.53) = 1 – 0.0630 = 0.9370

According to this result, there is a 93.7 percent chance that 43 percent or more workers in our sample are satisfied with their jobs. Not bad at all! The shaded area in Figure 13.5 represents this probability, which displays the sampling distribution of the proportion for this example.

Figure 13.5

Sampling distribution of the proportion.

Practice Problems

1. Calculate the standard error of the mean when …

   a. σ = 10, n = 15

   b. σ = 4.7, n = 12

   c. σ = 7, n = 20

2. A population has a mean value of 16.0 and a standard deviation of 7.5. Calculate the following with a sample size of 9.

   a. P( ≤ 17)

   b. P( > 18)

   c. P(14.5 ≤ ≤ 16.5)

3. Calculate the standard error of the proportion when …

   a. p = 0.25, n = 200

   b. p = 0.42, n = 100

   c. p = 0.06, n = 175

4. A population proportion has been estimated at 0.32. Calculate the following with a sample size of 160.

   a. P( ≤ 0.30)

   b. P( ≤ 0.36)

   c. P(0.29 ≤ ≤ 0.37)

5. A hypothetical statistics author is obsessed with making 10-foot putts. Each day that he practices, he putts 60 times and counts the number he makes. Over the last 20 practice sessions, he has averaged 24 made putts. What is the probability that he will make at least 30 putts during his next session?

The Least You Need to Know

  • The sampling distribution of the sample means refers to the pattern of sample means that will occur as samples are drawn from the population at large.
  • According to the central limit theorem, as the sample size, n, gets larger, the sample means tend to follow a normal probability distribution.
  • The standard error of the mean is the standard deviation of sample means and can be determined by .
  • The sampling distribution of the proportion refers to the pattern of sample proportions that will occur as samples are drawn from the population.
  • The standard error of the proportion is the standard deviation of the sample proportions and can be calculated by .
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset