16 Hypothesis Testing with Two Populations

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

CHAPTER
16

Hypothesis Testing with Two Populations

In This Chapter

Developing the sampling distribution for the difference in means
Testing the difference in means between populations when the population standard deviations (σ₁ and σ₂) are known and when they are unknown
Distinguishing between independent and dependent samples
Using Excel to perform a hypothesis test
Testing the difference in proportions between populations

Now we’re really cooking. Because you have done so well with one sample hypothesis testing, you are ready to graduate to the next level–two-sample testing. Here we often test to see whether there is a difference between two separate populations. For instance, we could test to see whether there was a difference between the average golf score for Bob’s sons: Brian and John. But as an “experienced” parent, Bob knew better than to go near that one.

Because many similarities exist between the concepts of this chapter and those of Chapter 15, you should have a firm handle on the previous chapter’s material before you jump into this one.

The Concept of Testing Two Populations

Many statistical studies involve comparing the same parameter, such as a mean, between two different populations. For example:

Is there a difference in average SAT scores between males and females?
Do “long-life” light bulbs really outlast standard light bulbs?
Does the average selling price of a house in Newark differ from the average selling price for a house in Wilmington?
Is there a difference between registered nurses’ salaries in New York and in California?

To answer such questions, we need to explore a new sampling distribution. (I promise this will be the last.) This one has the fanciest name of them all–the sampling distribution for the difference in means. (Dramatic background music brings us to the edge of our seats.)

DEFINITION

The sampling distribution for the difference in means describes the probability of observing various intervals for the difference between two sample means.

Sampling Distribution for the Difference in Means

Just like the sampling distribution for the sample mean we had in Chapter 13, we can get the sampling distribution for the difference in means, which can best be described in Figure 16.1.

As an example, let’s consider testing for a difference in the SAT scores for students between two states: North Carolina and South Carolina. We’ll assign students in North Carolina as Population 1 and South Carolina as Population 2. Graph 1 in Figure 16.1 represents the distribution of the SAT scores for North Carolina students with mean μ₁ and standard deviation σ₁. Similarly, Graph 2 represents the same for South Carolina with μ₂ and σ₂, respectively.

Figure 16.1

The sampling distribution for the difference in means.

Graph 3 represents the sampling distribution for the mean for the North Carolina students. This graph is the result of taking samples of size n₁ and plotting the distribution of sample means. Recall that we discussed this distribution of sample means back in Chapter 13. The mean of this distribution would be:

This is according to the Central Limit Theorem from Chapter 13. The same logic holds true for Graph 4 for the South Carolina population with mean of .

Graph 5 in Figure 16.1 shows the distribution that represents the difference of sample means from North Carolina and South Carolina populations. This is the sampling distribution for the difference in means, which has the following mean:

In other words, the mean of this distribution, shown in Graph 5, is the difference between the means of Graphs 3 and 4.

The standard deviation for Graph 5 is known as the standard error of the difference between two means and is calculated with:

where:

, = the variance for Populations 1 and 2

n₁, n₂ = the sample size from Populations 1 and 2

DEFINITION

The standard error of the difference between two means describes the variation in the difference between two sample means and is calculated using .

Now before you pull the rest of your hair out, let’s put these guys to work in the following section.

Testing for the Differences Between Two Population Means with Independent Samples

For this hypothesis test, we assume that the two samples are independent of each other. Samples are independent if they are not related in any way to each other. We are going to start with the case when the population standard deviations, σ₁ and σ₂, are known and then move on to the case when they are unknown.

When the Population Standard Deviations Are Known

When the sample sizes from both populations of interest are greater than 30, the central limit theorem allows us to use the normal distribution to approximate the sampling distribution for the difference in means. Let’s demonstrate this technique with the example we started in the previous section. We want to test whether there’s a difference in the average SAT scores for students in North Carolina and in South Carolina. To investigate that, we selected two random samples from the two states. A random sample of 75 students from North Carolina shows an average SAT score of 1,480. A random sample of 70 students from South Carolina shows an average SAT score of 1,450. Assume that the population standard deviation for the SAT scores in North Carolina is 200 and in South Carolina is 150. Using α = 0.05, we want to test whether there is a difference in the SAT scores between these two states. How do we do that? Read on!

We are going to follow the same 5 steps we used in Chapter 15.

Step 1: State the null and alternative hypotheses.

This is a two-tail test since we are testing whether there is a difference between μ₁ and μ₂. The null hypothesis states that there is no difference, μ₁ = μ₂. (Another way to write this is μ₁ - μ₂ = 0.) The same holds for H₁. So we can write our hypotheses like this:

H₀ : μ₁ - μ₂ = 0

H₁ : μ₁ - μ₂ ≠ 0

Step 2: Determine the level of significance: we are using α = 0.05.

Step 3: Calculate the z-test statistic:

The calculated z-test statistic is determined by the following equation:

We start with the standard error of the difference between two means as follows:

= 29.2363

We are now ready to determine the calculated z-test statistic as follows:

= 1.03

BOB’S BASICS

The term refers to the hypothesized difference between the two population means. When the null hypothesis is testing that there is no difference between population means, then the term is set to 0.

Step 4: Determine the critical values

The critical z-values for a two-tail test with α = 0.05 are ±1.96. Figure 16.2 shows the results of this hypothesis test.

Step 5: State your decision.

According to Figure 16.2, the calculated z-test statistic of 1.03 falls within the “Don’t Reject H₀” region, which leads us to conclude that there’s no significant difference between the average SAT scores for North Carolina and South Carolina.

Figure 16.2

Hypothesis test for the SAT example.

As we did in Chapter 15, we can also find the p-value for this example by using the standard normal z table found in Appendix B as follows:

P(z > +1.03) = 1 – P(z ≤ +1.03) = 1 – 0.8485 = 0.1515

Because this is a two-tail test, we need to double this area to arrive at our p-value, which is 2 × 0.1515 = 0.303. Because the p-value > α, we do not reject the null hypothesis.

TEST YOUR KNOWLEDGE

This technique applies to large samples, but what if we have small samples? We can also apply this technique to hypothesis testing that involves sample sizes less than 30 as long as both populations are normally distributed.

Testing a Difference Other Than Zero

In the previous example, we were just testing whether or not there was any difference between the two populations. We can also test whether the difference exceeds a certain value. As an example, suppose we want to test the hypothesis that the average salary of a mathematician in Florida exceeds the average salary of a mathematician in Texas by more than $5,000. We would state the hypotheses as follows:

H₀ : μ₁ – μ₂ ≤ 5,000

H₁ : μ₁ – μ₂ > 5,000

where:

μ₁ = the mean salary of a mathematician in Florida

μ₂ = the mean salary of a mathematician in Texas

We’ll assume that σ₁ = $8,100 and σ₂ = $7,600, and we’ll test this hypothesis at the α = 0.05 level.

A sample of 42 mathematicians from Florida had a mean salary of $85,500, whereas a sample of 54 mathematicians from Texas had a mean salary of $76,000.

The standard error of the difference between two means is:

= $1,622.30

Our calculated z-test statistic becomes:

= 2.77

The results of this hypothesis test are shown in Figure 16.3.

Figure 16.3

The hypothesis test for the mathematicians’ salary example.

The critical z-value for a one-tail (right side) test with α = 0.05 is +1.65. According to Figure 16.3, this places the calculated z-test statistic of +2.77 in the “Reject H₀” region, which leads to our conclusion that the difference in salaries between the two states exceeds $5,000.

When the Population Standard Deviations Are Unknown

In many cases, the population standard deviation, σ, is not known. So what can we do? I know that by now you are familiar–we made the same adjustments for small sample sizes back in Chapters 14 and 15. We have to make two changes to the previous technique:

1. We use the sample standard deviation, s, to approximate the population standard deviation, σ.

2. We use the t-distribution instead of the z-distribution. However, if one or both of our sample sizes is less than 30, then the population needs to be normally distributed to use any of the following techniques.

In this case when the population standard deviations are unknown, we have two cases to consider:

If the population variances are not equal.
If the population variances are equal.

As you’ll see below, the equation for the standard error of the difference between the two means, , will differ with each case. Let’s look at each instance in detail.

Unequal Population Variances When the Population Standard Deviations Are Unknown

In this case we are assuming that the population standard deviations are not equal, σ₁ ≠ σ₂. As we did in Chapter 15, we can follow the same steps but with the two changes in steps 3 and 4.

In step 3, the standard error of the difference between two means is as follows:

The calculated t-test statistic will be:

In step 4, we get the critical value(s) using the t-distribution instead of the z-distribution. The sampling distribution for the difference between sample means for this scenario follows the Student’s t-distribution.

Let’s illustrate this with an example. According to the Bureau of Labor Statistics, in May 2011 California and Massachusetts were among the top paying states for registered nurses. Let’s say I want to check whether there’s a difference in the annual salary for registered nurses between California and Massachusetts. I selected a random sample of 25 registered nurses from California and found their average salary is $90,800 with a standard deviation of $9,050. I selected another random sample of 20 registered nurses from Massachusetts and found their average salary is $86,400 with a standard deviation of $8,020. Assuming that the annual salary for registered nurses in both states are normally distributed, to investigate this issue, I start with my hypotheses:

H₀ : μ₁ – μ₂ = 0

H₁ : μ₁ – μ₂ ≠ 0

Where:

μ₁ = the mean annual salary for registered nurses in California.

μ₂ = the mean annual salary for registered nurses in Massachusetts.

Next, I’m going to calculate the standard error of the difference between two means as follows:

= 2,547.96

We are now ready to determine the calculated t-test statistic as follows:

= 1.73

Now, here is something different from the previous cases. To determine the degrees of freedom for the t-distribution, we use the following equation in this case (hold on to your hat):

Before you have a seizure, let me demonstrate that this animal’s bark is worse than its bite. In our registered nurses’ salary example, we start with the following terms

= 3,276,100 and = 3,216,020

We just substitute these values into the previous d.f. equation:

= 42.5064

42.56 falls between 40 and 45. Some books will round to the nearest number. To be on the conservative side, though, we are going to round down to 40. At α = 0.01 and degrees of freedom = 40, the critical t-value = ±2.704. The results of this hypothesis test are shown in Figure 16.4.

= 42.5064

Figure 16.4

Hypothesis test for the registered nurses’ salary example.

The calculated t-test statistic falls in the “Don’t Reject H₀” region, so based on our result there is no significant difference between registered nurses’ salaries in California and Massachusetts.

Some books use a simpler degrees of freedom equation in cases like this. They would use the smaller of n₁ – 1 and n₂ – 1. Using this, we would still get the same result for our example. At α = 0.01 and degrees of freedom = 19, the critical t-value is ±2.861. Our calculated t-test statistic still falls in the “Don’t Reject H₀” region and our decision won’t change by using this critical value.

This procedure was based on the assumption that the standard deviations of the populations were unequal. What if this assumption is not true? I’m glad you asked!

Equal Population Variances When the Population Standard Deviations Are Unknown

This case differs from the previous one (unequal population variances) in two ways: the standard error of the difference between two means and the calculated t-test statistics.

We are assuming that σ₁ = σ₂, but that the values of σ₁ and σ₂ are unknown. Under these conditions, we calculate a pooled variance by combining two sample variances into one using the following equation:

The pooled standard deviation is . The standard error of the difference between two means is as follows:

The calculated t-test statistic in this case is:

with d.f = n₁ + n₂ – 2

Don’t panic just yet. These equations look a whole lot better with numbers plugged in, so let’s apply them to an example.

Bob has a very mysterious occurrence in his household–batteries seem to vanish into thin air. So Bob started buying them in 24-packs at the warehouse store, naively thinking that “these will last a long time.” Wrong again–the more he bought, the faster they disappeared. Maybe it has something to do with certain teenagers listening to music on their portable CD players at a “brain-numbing” volume into the wee hours of the morning. Just a thought. Bob heard about the new “longer-lasting battery,” and he wanted to investigate it. Let’s say a company is promoting one of these batteries, claiming that its life is significantly longer than regular batteries. The hypothesis statement would be:

H₀ : μ₁ – μ₂ ≤ 0

H₁ : μ₁ – μ₂ > 0

where:

μ₁ = the mean life of the long-lasting batteries

μ₂ = the mean life of the regular batteries

We’ll choose to test this hypothesis at the α = 0.01 level. The following data was collected measuring the battery life in hours for both types of batteries:

Raw Data for Battery Example

Long-Lasting Battery (Population 1):
51 44 58 36 48 53 57 40 49 44 60 50

Regular Battery (Population 2):
42 29 51 38 39 44 35 40 48 45

Using Excel, we can summarize this data in the following table.

Summarized Battery Data

In this example, since we have small samples, we’ll have to make the assumption that the battery life in both populations is normally distributed in order to conduct this test. Now we can plug these numbers into the pooled standard deviation equation:

= 6.92 hours.

DEFINITION

The pooled standard deviation combines two sample variances into one variance and is calculated using .

We are now ready to determine our calculated t-test statistic as follows:

= 2.73

The number of degrees of freedom for this test are:

d.f. = n₁ + n₂ – 2 = 12 + 10 – 2 = 20

The critical t-value, taken from Table 4 in Appendix B, for a one-tail (right) test using α = 0.01 with d.f. = 20 is +2.528. This hypothesis test is shown graphically in Figure 16.5.

Figure 16.5

The hypothesis test for the battery life example.

According to Figure 16.5, our calculated t-test statistic of +2.73 is found in the “Reject H₀” region, which leads to our conclusion that the long-lasting batteries do indeed have a longer life than the regular batteries. Now that would have made Bob happy!

RANDOM THOUGHTS

These conditions are necessary for the hypothesis testing for differences between two means:

The samples are independent of each other.
If one or both of the sample sizes is small (<30), then the population must be normally distributed.
If σ₁ and σ₂ are known, use the normal distribution to determine the rejection region(s).
If σ₁ and σ₂ are unknown, approximate them with s₁ and s₂ and use the Student’s t-distribution to determine the rejection region(s).

To Pool or Not to Pool?

In the previous section, we explored two cases for hypotheses testing between two populations: when the population variances are assumed to be equal and when they are unequal. I know you are asking, “When should I use each one?” Great question. Read on!

If the two samples are taken from the same population, then it is safe to assume equal population variances. For example, we could assume equal population variances if a researcher is interested in testing the effectiveness of a medication. The researcher chooses two different random samples from the same population: one sample receives the medication and the other sample receives the placebo. In this case, since the two samples are chosen from the same population, it is safe to assume equal population variances.

Another method is to compare the two sample variances, and . These sample variances are estimates of the population variances. If they are close in value, then assume equal population variances. If they are not close, then assume unequal population variances. “How close is close?” you are asking. A simple rule of thumb is that if one of the sample variances is more than twice as large as the other sample variance, then it is safe to assume unequal population variances.

In addition, there are tests that you can perform to test for equal variances. However, these are outside the scope of this book.

Testing for Differences Between Means with Dependent Samples

Up to this point, all the samples that we have used in the chapter have been independent samples. Independent samples are not related in any way with each other. This is in contrast to dependent samples, where each observation of one sample is related to an observation in another sample.

DEFINITION

With independent samples, there is no relationship in the observations between the samples. With dependent samples, the observation from one sample is related to an observation from another sample.

An example of a dependent sample would be a weight-loss study. Each person is weighed at the beginning (Population 1) and end (Population 2) of the program. The change in weight of each person is calculated by subtracting the Population 2 weights from the Population 1 weights. Each observation from Population 1 is matched to an observation in Population 2. This hypothesis test is also known as a “matched-pair test.” Dependent samples are tested differently than independent samples.

To demonstrate testing dependent samples, let’s use the following example. A pharmaceutical company is introducing a new medication to lower cholesterol levels for patients. To test the effectiveness of their new medication, they selected a random sample of 9 people and measured their cholesterol levels before and after they took the medication. Cholesterol level is measured in milligrams per deciliter (mg/dL). The following table shows the results. The letter “d” refers to the difference between the cholesterol levels before and after taking the medication. The variable d is also called the matched-pair difference and is calculated as d = x₁ – x₂.

Differences in Cholesterol Levels Example

For future calculations, we will need:

= 191

= 8,297

Patients’ cholesterol levels before taking the medication will be considered Population 1, and after taking the medication will be considered Population 2. Because we are using the same patient’s cholesterol level before and after taking the medication, these two samples are considered dependent.

Since the claim we are testing is μ₁ > μ₂, we can write our null and alternative hypotheses as:

H₀ : μ₁ – μ₂ ≤ 0

H₁ : μ₁ – μ₂ > 0

where:

μ₁ = the average cholesterol level for patients before taking the new medication.

μ₂ = the average cholesterol level for patients after taking the new medication.

However, because we are only interested in the difference between the two populations, we can rewrite this statement as a single sample hypothesis as follows:

H₀ : μ_d ≤ 0

H₁ : μ_d > 0

where μ_d is the mean of the difference between the two populations.

We will test this hypothesis using α = 0.05.

Our next step is to calculate the mean, , and the standard deviation, s_d, of the matched-pair difference as follows:

= 21.22 mg/dL

This number tells us that the new medication lowers the cholesterol level by an average of about 21 mg/dL.

= 23.03 mg/dL

TEST YOUR KNOWLEDGE

Does the equation for s_d look familiar? Yes, it is the same standard deviation equation you learned in Chapter 5.

If both populations follow the normal distribution, we use the Student’s t-distribution because both sample sizes are less than 30 and σ₁ and σ₂ are unknown. The calculated t-test statistic is:

= 2.76

The number of degrees of freedom for this test are:

d.f. = n – 1 = 9 – 1 = 8

The critical t-value, taken from Table 4 in Appendix B, for a one-tail (right) test using α = 0.05 with d.f. = 8 is +1.86. This hypothesis test is shown graphically in Figure 16.6.

Figure 16.6

The hypothesis test for the cholesterol difference example.

According to Figure 16.6, our calculated t-test statistic of +2.76 falls into the “Reject H₀” region, which leads us to conclude that the new medication lowers cholesterol levels. Good news for the pharmaceutical company!

Letting Excel Do the Grunt Work

Excel performs many of the hypothesis tests that we’ve discussed in this chapter. Let me show you a few examples using this nifty tool. (First make sure that your Data Analysis Add-In is installed. Refer to the section “Installing the Data Analysis Add-In” from Chapter 2 if you don’t see the command.) I’ll start with the previous battery example. Follow these steps:

1. Open a blank Excel sheet and enter the data from the battery example in Columns A and B as shown in Figure 16.7.

2. From the Data tab at the top of the Excel window, choose Data Analysis and select t-Test: Two-Sample: Assuming Unequal Variances.

3. Click OK.

Figure 16.7

Data entry for the battery example.

4. In the t-Test: Two-Sample Assuming Unequal Variances dialog box, choose cells A1:A13 for the Variable 1 Range and cells B1:B11 for the Variable 2 Range. Set the Hypothesized Mean Difference to 0, Alpha to 0.01, Output Range to cell D2, and check the Labels box since we have the labels in the first cells, as shown in Figure 16.8.

Figure 16.8

The t-Test: Two Sample Assuming Unequal Variances dialog box.

5. Click OK. The t-test output is shown in Figure 16.9.

Figure 16.9

t-Test: Two Sample Assuming Unequal Variances output.

According to Figure 16.9, the calculated t-test statistic is 2.759, found in cell E10. This is the same value we calculated using the formula in the previous sections. The p-value of 0.006 is found in cell E11. Because the p-value < α, we reject the null hypothesis.

If you look at the Excel Data Analysis tool box in Figure 16.7, you see that we can do hypothesis tests for all four cases we had in this chapter. You can use the z-Test: Two Samples for Means to do hypothesis testing when the population standard deviations (σ₁ and σ₂) are known. You can use t-Test: Two-Sample: Assuming Equal Variances to do hypothesis testing when the population standard deviations (σ₁ and σ₂) are unknown and assuming equal population variances. You can use t-Test: Paired Two Samples for Means to do hypothesis testing with dependent samples.

Let’s see how to use Excel to do hypothesis testing when the population standard deviations (σ₁ and σ₂) are known, a z-test. We are going to use our previous example of SAT scores between North Carolina and South Carolina, where σ₁ for North Carolina = 200 and σ₂ for South Carolina = 150. I’m going to use a smaller sample to make the calculations easier and faster–we’ll use 35 and 30 respectively, instead of 75 and 70 as we did in our example above using the formula. Just like the previous example, we will enter the data in columns A and B and choose z-Test: Two Sample for Means from the Data Analysis box.

In the z-Test: Two Sample for Means dialog box, choose cells A1:A36 for the Variable 1 Range and cells B1:B31 for the Variable 2 Range. Set the Hypothesized Mean Difference to 0, Alpha to 0.05, Variable 1 Variance to 40000 (which is (200)²), Variable 2 Variance to 22500 (which is (150)²), Output Range to cell D1, and check the Labels box since we have the labels in the first cells as shown in Figure 16.10.

Figure 16.10

The z-Test: Two Sample for Means dialog box.

Click OK. The z-test output is shown in Figure 16.11.

Figure 16.11

The z-Test: Two Sample for Means output.

According to Figure 16.11, the calculated z-test statistic is -1.2987, found in cell E8, and the z-critical value is ±1.96, found in cell E12. The p-value is 0.194, found in cell E11. Because p-value > α, we don’t reject the null hypothesis.

Let’s go through one more example of how to use Excel to do hypothesis testing with dependent samples. We are going to use our previous example of the differences in cholesterol levels. Just like in the last Excel example, we will enter the data in columns A and B and choose t-Test: Paired Two Samples for Means from the Data Analysis box.

In the t-Test: Paired Two Sample for Means dialog box, choose cells A1:A10 for the Variable 1 Range and cells B1:B10 for the Variable 2 Range. Set the Hypothesized Mean Difference to 0, Alpha to 0.05, Output Range to cell D2, and check the Labels box since we have the labels in the first cells, as shown in Figure 16.12.

Figure 16.12

The t-Test: Paired Two Sample for Means dialog box.

Click OK. The t-test output is shown in Figure 16.13.

Figure 16.13

The t-Test: Paired Two Sample for Means output.

According to Figure 16.13, the calculated t-test statistic is 2.76, found in cell E11. The p-value is 0.012, found in cell E12. Because p-value < α, we reject the null hypothesis.

Testing for Differences Between Proportions with Independent Samples

We can perform hypothesis testing to examine the difference between proportions of two populations as long as the sample size is large enough. Recall from Chapter 13, proportion data follow the binomial distribution, which can be approximated by the normal distribution under the following conditions.

np ≥ 5 and nq ≥ 5

where:

p = the probability of a success in the population

q = the probability of a failure in the population (q = 1 – p)

Let’s say that I want to test whether the proportion of males and females who use Facebook every day are the same. My hypotheses would be stated as:

H₀ : p₁ = p₂

H₁ : p₁ ≠ p₂

where:

p₁ = the proportion of males who use Facebook every day

p₂ = the proportion of females who use Facebook every day

The following table summarizes the data from the Facebook samples:

Summarized Data for Facebook Samples

Population	Number of Successes (x)	Sample Size (n)
Male	207	300
Female	266	350

What can we conclude at α = 0.10 level?

Our sample proportion of male Facebook users, , and female users, , can be found by:

= 0.69 and = 0.76

To determine the calculated z-test statistic, we need to know the standard error of the difference between two proportions (that’s a mouthful), , which is found using:

Our problem is that we don’t know the values of p₁ and p₂, the actual population proportions of male and female Facebook users. The next best thing is to calculate the estimated standard error of the difference between two proportions, , using the following equation:

where , the estimated overall proportion of two populations, is found using the following equation:

= 0.728

For our Facebook example, the estimated standard error of the difference between two proportions is:

= 0.035

Now we can finally determine the calculated z-test statistic using:

For the Facebook example, our calculated z-test statistic becomes:

= -2.00

BOB’S BASICS

The term refers to the hypothesized difference between the two population proportions. When the null hypothesis is testing that there is no difference between population proportions, then the term is set to 0.

The critical z-values for a two-tail test with α = 0.10 are +1.65 and -1.65. Figure 16.14 shows this hypothesis test graphically.

Figure 16.14

The hypothesis test for the Facebook example.

As you can see in Figure 16.14, the calculated z-test statistic of -2.00 falls in the “Reject H₀” region. Therefore, we conclude that the proportions of males and females who use Facebook every day are not equal to each other. I’m sure you are not surprised by this conclusion!

BOB’S BASICS

The standard error of the difference between two proportions describes the variation in the difference between two sample proportions and is calculated using . The estimated standard error of the difference between two proportions approximates the variation in the difference between two sample proportions and is calculated using . The estimated overall proportion of two populations is the weighted average of two sample proportions and is calculated using .

The p-value for these samples can be found using the normal z distribution table found in Appendix B as follows:

2 × P(z < -2.00) = 2 × (0.0228)= 0.0456

This also confirms that we reject H₀ because the p-value < α.

This completes our invigorating journey through the land of hypothesis testing.

Practice Problems

1. Test the hypothesis that the average SAT math scores from students in Pennsylvania and Ohio are different. A sample of 45 students from Pennsylvania had an average score of 552, whereas a sample of 38 Ohio students had an average score of 530. Assume the population standard deviations for Pennsylvania and Ohio are 105 and 114, respectively. Test at the α = 0.05 level. What is the p-value for these samples?

2. A company tracks satisfaction scores based on customer feedback from individual stores on a scale of 0 to 100. The following data represents the customer scores from Stores 1 and 2.

Store 1:

90 87 93 75 88 96 90 82 95 97 78

Store 2:

82 85 90 74 80 89 75 81 93 75

Assume population standard deviations are equal but unknown and that the population is normally distributed. Using α = 0.10, test the hypothesis that the customer scores in the two stores are different..

3. A new diet program claims that participants will lose more than 15 pounds after completion of the program. The following data represents the before and after weights of nine individuals who completed the program. Test the claim at the α = 0.05 level.

Before: 221 215 206 185 202 197 244 188 218

After: 200 192 195 166 187 177 227 165 201

4. Test the hypothesis that the proportion of home ownership in the state of Florida exceeds the national proportion at the α = 0.01 level using the following data.

Population	Number of Successes	Sample Size
Florida	272	400
Nation	390	600

What is the p-value for these samples?

5. Test the hypothesis that the average hourly wage for City A is more than $0.50 per hour above the average hourly wage in City B using the following sample data:

City	Average Wage	Standard Deviation	Sample Size
A	$9.80	$2.25	60
B	$9.10	$2.70	80

Test at the α = 0.05 level. What is the p-value for this test?

6. Test the hypothesis that the average number of days that a home is on the market in City A is different from City B using the following sample data:

Assume population standard deviations are unequal and that the population is normally distributed. Test the hypothesis using α = 0.10.

The Least You Need to Know

We use the normal distribution for the hypothesis test for the difference between means when n ≥ 30 for both samples.
We use the normal distribution for the hypothesis test for the difference between means when σ₁ and σ₂ are known, n < 30 for either sample, and both populations are normally distributed.
We use the Student’s t-distribution for the hypothesis test for the difference between means when σ₁ and σ₂ are unknown, n < 30 for either sample, and both populations are normally distributed.
With dependent samples, the observation from one sample is related to an observation from another sample. With independent samples, there is no relationship in the observations between the samples.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 16 Hypothesis Testing with Two Populations

Create new playlist

Sign In

Sign Up

Table of Contents for
16 Hypothesis Testing with Two Populations