Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

8.1 Comparing Two Population Proportions: Independent Sampling

Teaching Tip

Discuss the basic differences between a population mean and a population proportion. Help the student identify the key words that will suggest either means or proportions.

Suppose a presidential candidate wants to compare the preferences of registered voters in the northeastern United States with those in the southeastern United States. Such a comparison would help determine where to concentrate campaign efforts. The candidate hires a professional pollster to randomly choose 1,000 registered voters in the northeast and 1,000 in the southeast and interview each to learn her or his voting preference. The objective is to use this sample information to make an inference about the difference (p1−p2) $(p_{1} - p_{2})$ between the proportion p1 $p_{1}$ of all registered voters in the northeast and the proportion p2 $p_{2}$ of all registered voters in the southeast who plan to vote for the presidential candidate.

The two samples represent independent binomial experiments. (See Section 4.3 for the characteristics of binomial experiments.) The binomial random variables are the numbers x1 $x_{1}$ and x2 $x_{2}$ of the 1,000 sampled voters in each area who indicate that they will vote for the candidate. The results are summarized in Table 8.1

Table 8.1 Results of Poll

Northeast	Southeast
n1=1,000 $n_{1} = 1, 000$	n2=1,000 $n_{2} = 1, 000$
x1=546 $x_{1} = 546$	x2=475 $x_{2} = 475$

We can now calculate the sample proportions pˆ1 ${\hat{p}}_{1}$ and pˆ2 ${\hat{p}}_{2}$ of the voters in favor of the candidate in the northeast and southeast, respectively:

p ˆ = x 1 n 1 = 546 1 , 000 = .546 p ˆ 2 = x 2 n 2 = 475 1 , 000 = .475

$\begin{array}{l} \hat{p} = \frac{x_{1}}{n_{1}} = \frac{546}{1, 000} = .546 & {\hat{p}}_{2} = \frac{x_{2}}{n_{2}} = \frac{475}{1, 000} = .475 \end{array}$

The difference between the sample proportions (pˆ1−pˆ2) $({\hat{p}}_{1} - {\hat{p}}_{2})$ makes an intuitively appealing point estimator of the difference between the population (p1−p2). $(p_{1} - p_{2}) .$ For this example, the estimate is

(p ˆ 1 - p ˆ 2) = .546 - .475 = .071

$({\hat{p}}_{1} - {\hat{p}}_{2}) = .546 - .475 = .071$

To judge the reliability of the estimator (pˆ1−pˆ2), $({\hat{p}}_{1} - {\hat{p}}_{2}),$ we must observe its performance in repeated sampling from the two populations. That is, we need to know the sampling distribution of (pˆ1−pˆ2). $({\hat{p}}_{1} - {\hat{p}}_{2}) .$ The properties of the sampling distribution are given in the next box. Remember that pˆ1 ${\hat{p}}_{1}$ and pˆ2 ${\hat{p}}_{2}$ can be viewed as means of the number of successes per trial in the respective samples, so the Central Limit Theorem applies when the sample sizes are large.

Teaching Tip

Point out that the sample size requirement is the same in working with two proportions as it was when we worked with a single proportion. It must be checked for each sample now.

Properties of the Sampling Distribution of (pˆ1−pˆ2) $({\hat{p}}_{1} - {\hat{p}}_{2})$

The mean of the sampling distribution of (pˆ1−pˆ2) $({\hat{p}}_{1} - {\hat{p}}_{2})$ is (p1−p2); $(p_{1} - p_{2});$ that is,

$E (p ˆ 1 - p ˆ 2) = p 1 - p 2$ $E ({\hat{p}}_{1} - {\hat{p}}_{2}) = p_{1} - p_{2}$

Thus, (pˆ1−pˆ2) $({\hat{p}}_{1} - {\hat{p}}_{2})$ is an unbiased estimator of (p1−p2). $(p_{1} - p_{2}) .$
The standard deviation of the sampling distribution of (pˆ1−pˆ2) $({\hat{p}}_{1} - {\hat{p}}_{2})$ is

$σ (p ˆ 1 - p ˆ 2) = p 1 q 1 n 1 + p 2 q 2 n 2 - - - - - - - - - - \sqrt$ $σ_{({\hat{p}}_{1} - {\hat{p}}_{2})} = \sqrt{\frac{p_{1} q_{1}}{n_{1}} + \frac{p_{2} q_{2}}{n_{2}}}$
If the sample sizes n1 $n_{1}$ and n2 $n_{2}$ are large (see Section 5.4 for a guideline), the sampling distribution of (pˆ1−pˆ2) $({\hat{p}}_{1} - {\hat{p}}_{2})$ is approximately normal.

Teaching Tip

The interpretations of confidence intervals have remained the same even though the formulas used to estimate the various parameters have changed. Use the interpretation of this confidence interval to point out that fact to the student.

Since the distribution of (pˆ1−pˆ2) $({\hat{p}}_{1} - {\hat{p}}_{2})$ in repeated sampling is approximately normal, we can use the z-statistic to derive confidence intervals for (p1−p2) $(p_{1} - p_{2})$ or to test a hypothesis about (p1−p2). $(p_{1} - p_{2}) .$

For the voter example, a 95% confidence interval for the difference (p1−p2) $(p_{1} - p_{2})$ is

(p ˆ 1 - p ˆ 2) \pm 1.96 σ (p ˆ 1 - p ˆ 2), or (p ˆ 1 - p ˆ 2) \pm 1.96 p 1 q 1 n 1 + p 2 q 2 n 2 - - - - - - - - - - \sqrt

$({\hat{p}}_{1} - {\hat{p}}_{2}) \pm 1.96 σ_{({\hat{p}}_{1} - {\hat{p}}_{2})}, or ({\hat{p}}_{1} - {\hat{p}}_{2}) \pm 1.96 \sqrt{\frac{p_{1} q_{1}}{n_{1}} + \frac{p_{2} q_{2}}{n_{2}}}$

The quantities p1q1 $p_{1} q_{1}$ and p2q2 $p_{2} q_{2}$ must be estimated in order to complete the calculation of the standard deviation σ(pˆ1−pˆ2) $σ_{({\hat{p}}_{_{1}} - {\hat{p}}_{2})}$ and, hence, the calculation of the confidence interval. In Section 5.4, we showed that the The value of pq is relatively insensitive to the value chosen to approximate p. Therefore, pˆ1qˆ1 ${\hat{p}}_{1} {\hat{q}}_{1}$ and pˆ2qˆ2 ${\hat{p}}_{2} {\hat{q}}_{2}$ will provide satisfactory approximations of p1q1 $p_{1} q_{1}$ and p2q2, $p_{2} q_{2},$ respectively. Then

p 1 q 1 n 1 + p 2 q 2 n 2 - - - - - - - - - - \sqrt \approx p ˆ 1 q ˆ 1 n 1 + p ˆ 2 q ˆ 2 n 2 - - - - - - - - - - - \sqrt

$\sqrt{\frac{p_{1} q_{1}}{n_{1}} + \frac{p_{2} q_{2}}{n_{2}}} \approx \sqrt{\frac{{\hat{p}}_{1} {\hat{q}}_{1}}{n_{1}} + \frac{{\hat{p}}_{2} {\hat{q}}_{2}}{n_{2}}}$

and we will approximate the 95% confidence interval by

(p ˆ 1 - p ˆ 2) \pm 1.96 p ˆ 1 q ˆ 1 n 1 + p ˆ 2 q ˆ 2 n 2 - - - - - - - - - - - \sqrt

$({\hat{p}}_{1} - {\hat{p}}_{2}) \pm 1.96 \sqrt{\frac{{\hat{p}}_{1} {\hat{q}}_{1}}{n_{1}} + \frac{{\hat{p}}_{2} {\hat{q}}_{2}}{n_{2}}}$

Substituting the sample quantities yields

(.546 - .475) \pm 1.96 ( .546 ) ( .454 ) 1 , 000 + ( .475 ) ( .525 ) 1 , 000 - - - - - - - - - - - - - - - - - - - - - - \sqrt

$(.546 - .475) \pm 1.96 \sqrt{\frac{(.546) (.454)}{1, 000} + \frac{(.475) (.525)}{1, 000}}$

or .071±.044. $.071 \pm .044 .$ Thus, we are 95% confident that the interval from .027 to .115 contains (p1−p2). $(p_{1} - p_{2}) .$

We infer that there are between 2.7% and 11.5% more registered voters in the northeast than in the southeast who plan to vote for the presidential candidate. It seems that the candidate should direct a greater campaign effort in the southeast than in the northeast.

Now Work Exercise 8.9

The general form of a confidence interval for the difference (p1−p2) $(p_{1} - p_{2})$ between population proportions is given in the following box:

Teaching Tip

Point out that the t-distribution is not used with proportions because the underlying assumption of a normal distribution will never be true.

Large-Sample 100( 1−α) $100 (1 - α)$ % Confidence Interval for ( p1− p2) $(p_{1} - p_{2})$ : Normal (`z`) Statistic

(p ˆ 1 - p ˆ 2) \pm z α / 2 σ (p ˆ 1 - p ˆ 2) = \approx (p ˆ 1 - p ˆ 2) \pm z α / 2 p 1 q 1 n 1 + p 2 q 2 n 2 - - - - - - - - \sqrt (p ˆ 1 - p ˆ 2) \pm z α / 2 p ˆ 1 q ˆ 1 n 1 + p ˆ 2 q ˆ 2 n 2 - - - - - - - - - \sqrt

$\begin{array}{l} ({\hat{p}}_{1} - {\hat{p}}_{2}) \pm z_{α / 2} σ_{({\hat{p}}_{1} - {\hat{p}}_{2})} & = & ({\hat{p}}_{1} - {\hat{p}}_{2}) \pm z_{α / 2} \sqrt{\frac{p_{1} q_{1}}{n_{1}} + \frac{p_{2} q_{2}}{n_{2}}} \\ \approx & ({\hat{p}}_{1} - {\hat{p}}_{2}) \pm z_{α / 2} \sqrt{\frac{{\hat{p}}_{1} {\hat{q}}_{1}}{n_{1}} + \frac{{\hat{p}}_{2} {\hat{q}}_{2}}{n_{2}}} \end{array}$

The z-statistic,

z = ( p ˆ 1 - p ˆ 2 ) - ( p 1 - p 2 ) σ ( p ˆ 1 - p ˆ 2 )

$z = \frac{({\hat{p}}_{1} - {\hat{p}}_{2}) - (p_{1} - p_{2})}{σ_{({\hat{p}}_{1} - {\hat{p}}_{2})}}$

is used to test the null hypothesis that (p1− p2) $(p_{1} - p_{2})$ equals some specified difference, say, D0. $D_{0} .$ For the special case where D0=0 $D_{0} = 0$ —that is, where we want to test the null hypothesis H0:(p1− p2)=0 $H_{0} : (p_{1} - p_{2}) = 0$ (or, equivalently, H0:p1=p2 $H_{0} : p_{1} = p_{2}$ )—the best estimate of p1=p2=p $p_{1} = p_{2} = p$ is obtained by dividing the total number of successes (x1+x2) $(x_{1} + x_{2})$ for the two samples by the total number of observations (n1+n2) $(n_{1} + n_{2})$ , that is,

p ˆ = x 1 + x 2 n 1 + n 2, or p ˆ = n 1 p ˆ 1 + n 2 p ˆ 2 n 1 + n 2

$\begin{array}{l} \hat{p} = \frac{x_{1} + x_{2}}{n_{1} + n_{2}}, & or & \hat{p} = \frac{n_{1} {\hat{p}}_{1} + n_{2} {\hat{p}}_{2}}{n_{1} + n_{2}} \end{array}$

The second equation shows that pˆ $\hat{p}$ is a weighted average of pˆ1 ${\hat{p}}_{1}$ and pˆ2, ${\hat{p}}_{2},$ with the larger sample receiving more weight. If the sample sizes are equal, then pˆ $\hat{p}$ is a simple average of the two sample proportions of successes.

We now substitute the weighted average pˆ $\hat{p}$ for both p1 $p_{1}$ and p2 $p_{2}$ in the formula for the standard deviation of (pˆ1− pˆ2): $({\hat{p}}_{1} - {\hat{p}}_{2}) :$

σ (p ˆ - p ˆ 2) = p 1 q 1 n 1 + p 2 q 2 n 2 - - - - - - - - - - \sqrt \approx p ˆ q ˆ n 1 + p ˆ q ˆ n 2 - - - - - - - - \sqrt = p ˆ q ˆ (1 n 1 + 1 n 2) - - - - - - - - - - - - - \sqrt

$σ_{(\hat{p} - {\hat{p}}_{2})} = \sqrt{\frac{p_{1} q_{1}}{n_{1}} + \frac{p_{2} q_{2}}{n_{2}}} \approx \sqrt{\frac{\hat{p} \hat{q}}{n_{1}} + \frac{\hat{p} \hat{q}}{n_{2}}} = \sqrt{\hat{p} \hat{q} (\frac{1}{n_{1}} + \frac{1}{n_{2}})}$

The test is summarized in the following box:

Teaching Tip

No matched-pairs experiment with proportions exists for this type of analysis.

Large-Sample Test of Hypothesis about ( p1− p2) : $(p_{1} - p_{2}) :$ Normal (`z`) Statistic

Test statistic:	$z c = ( p ˆ 1 - p ˆ 2 ) p 1 q 1 n 1 + p 2 q 2 n 2 - - - - - - - - \sqrt \approx ( p ˆ 1 - p ˆ 2 ) p ˆ q ˆ ( 1 n 1 + 1 n 2 ) - - - - - - - - - - - \sqrt$ $z_{c} = \frac{({\hat{p}}_{1} - {\hat{p}}_{2})}{\sqrt{\frac{p_{1} q_{1}}{n_{1}} + \frac{p_{2} q_{2}}{n_{2}}}} \approx \frac{({\hat{p}}_{1} - {\hat{p}}_{2})}{\sqrt{\hat{p} \hat{q} (\frac{1}{n_{1}} + \frac{1}{n_{2}})}}$
	$where p ˆ = (x 1 + x 2) / (n 1 + n 2)$ $where \hat{p} = (x_{1} + x_{2}) / (n_{1} + n_{2})$
	One-Tailed Tests	Two-Tailed Test
	$H 0 : p 1 - p 2 = 0 * H 0 : p 1 - p 2 = 0$ $\begin{array}{l} H_{0} : p_{1} - p_{2} = 0 * & H_{0} : p_{1} - p_{2} = 0 \end{array}$	$H 0 : p 1 - p 2 = 0$ $H_{0} : p_{1} - p_{2} = 0$
	$H a : p 1 - p 2 < 0 H a : p 1 - p 2 > 0$ $\begin{array}{l} H_{a} : p_{1} - p_{2} < 0 & H_{a} : p_{1} - p_{2} > 0 \end{array}$	$H a : p 1 - p 2 \neq 0$ $H_{a} : p_{1} - p_{2} \neq 0$
Rejection region:	$z c < - z α$ $z_{c} < - z_{α}$ $z c > z α$ $z_{c} > z_{α}$	$\| z c \| > z α / 2$ $\| z_{c} \| > z_{α / 2}$
p-value:	$P (z < z c)$ $P (z < z_{c})$ $P (z > z c)$ $P (z > z_{c})$	$2 P (z > z c) if z c is positive$ $2 P (z > z_{c}) if z_{c} is positive$
		$2 P (t < z c) if z c is negative$ $2 P (t < z_{c}) if z_{c} is negative$
Decision: Reject `H`₀ if α>p-value $α > p -value$ or if test statistic (zc) $(z_{c})$ falls in rejection region where P(z>zα)=α,P(z>zα/2)=α/2, $P (z > z_{α}) = α, P (z > z_{α / 2}) = α / 2,$ and α=P(Type I error)=P(Reject H0\|H0 true) $α = P (Type I error) = P (Reject H_{0} \| H_{0} true)$

*The test can be adapted to test for a difference in proportions D0≠0 $D_{0} \neq 0$ . Because most applications require a test of equal proportions, p1=p2 $p_{1} = p_{2}$ (i.e., p1− p2=D0=0 $p_{1} - p_{2} = D_{0} = 0$ ), we confine our attention to this case.

Conditions Required for Valid Large-Sample Inferences about ( p1−p2) $(p_{1} - p_{2})$

The two samples are randomly selected in an independent manner from the two target populations.
The sample sizes, n1 $n_{1}$ and n2, $n_{2},$ are both large, so the sampling distribution of (pˆ1− pˆ2) $({\hat{p}}_{1} - {\hat{p}}_{2})$ will be approximately normal. (This condition will be satisfied if both n1pˆ1≥15, n1qˆ1≥15, $n_{1} {\hat{p}}_{1} \geq 15, n_{1} {\hat{q}}_{1} \geq 15,$ and n2pˆ2≥15, n2qˆ2≥15. $n_{2} {\hat{p}}_{2} \geq 15, n_{2} {\hat{q}}_{2} \geq 15.$ )

Example 8.1 A Large-Sample Test about (p1− p2) $(p_{1} - p_{2})$ —Comparing Fractions of Smokers for Two Years

Problem

In the past decade, intensive antismoking campaigns have been sponsored by both federal and private agencies. Suppose the American Cancer Society randomly sampled 1,500 adults in 2000 and then sampled 1,750 adults in 2010 to determine whether there was evidence that the percentage of smokers had decreased. The results of the two sample surveys are shown in Table 8.2, where x1 $x_{1}$ and x2 $x_{2}$ represent the numbers of smokers in the 2000 and 2010 samples, respectively. Do these data indicate that the fraction of smokers decreased over this 10-year period? Use α=.05. $α = .05 .$

Table 8.2 Results of Smoking Survey

2000	2010
n1=1,500 $n_{1} = 1, 500$	n2=1,750 $n_{2} = 1, 750$
x1=555 $x_{1} = 555$	x2=578 $x_{2} = 578$

Solution

If we define p1 $p_{1}$ and p2 $p_{2}$ as the true proportions of adult smokers in 2000 and 2010, respectively, then the elements of our test are

$H 0 : (p 1 - p 2) = 0 H a : (p 1 - p 2) > 0$ $\begin{array}{l} H_{0} : (p_{1} - p_{2}) = 0 \\ H_{a} : (p_{1} - p_{2}) > 0 \end{array}$

(The test is one tailed, since we are interested only in determining whether the proportion of smokers decreased.)

$T e s t s t a t i s t i c : z = ( p ˆ 1 - p ˆ 2 ) - 0 σ ( p ˆ 1 - p ˆ 2 )$ $T e s t s t a t i s t i c : z = \frac{({\hat{p}}_{1} - {\hat{p}}_{2}) - 0}{σ_{({\hat{p}}_{1} - {\hat{p}}_{2})}}$
Rejection region using α=.05:z>zα=z.05=1.645 $R e j e c t i o n r e g i o n u s i n g α = .05 : z > z_{α} = z_{.05} = 1.645$ (see Figure 8.1)

Figure 8.1

Rejection region for Example 8.1

We now calculate the sample proportions of smokers:

$p ˆ 1 = 555 1 , 500 = .37 p ˆ 2 = 578 1 , 750 = .33$ ${\hat{p}}_{1} = \frac{555}{1, 500} = .37 {\hat{p}}_{2} = \frac{578}{1, 750} = .33$

Suggested Exercise 8.18
Then

$z = ( p ˆ 1 - p ˆ 2 ) - 0 σ ( p ˆ 1 - p ˆ 2 ) \approx ( p ˆ 1 - p ˆ 2 ) p ˆ q ˆ ( 1 n 1 + 1 n 2 ) - - - - - - - - - - - \sqrt$ $z = \frac{({\hat{p}}_{1} - {\hat{p}}_{2}) - 0}{σ_{({\hat{p}}_{1} - {\hat{p}}_{2})}} \approx \frac{({\hat{p}}_{1} - {\hat{p}}_{2})}{\sqrt{\hat{p} \hat{q} (\frac{1}{n_{1}} + \frac{1}{n_{2}})}}$

where

$p ˆ = x 1 + x 2 n 1 + n 2 = 555 + 578 1 , 500 + 1 , 750 = .349$ $\hat{p} = \frac{x_{1} + x_{2}}{n_{1} + n_{2}} = \frac{555 + 578}{1, 500 + 1, 750} = .349$

Note that pˆ $\hat{p}$ is a weighted average of pˆ1 ${\hat{p}}_{1}$ and pˆ2, ${\hat{p}}_{2},$ with more weight given to the larger (2010) sample. Thus, the computed value of the test statistic is

$z = .37 - .33 ( .349 ) ( .651 ) ( 1 1 , 500 + 1 1 , 750 ) = .040 .0168 = 2.38$ $z = \frac{.37 - .33}{(.349) (.651) (\frac{1}{1, 500} + \frac{1}{1, 750})} = \frac{.040}{.0168} = 2.38$

There is sufficient evidence at the α=.05 $α = .05$ level to conclude that the proportion of adults who smoke has decreased over the 10-year period.

Look Back

We could place a confidence interval on (p1− p2) $(p_{1} - p_{2})$ if we were interested in estimating the extent of the decrease.

Now Work Exercise 8.14

Example 8.2 Finding The Observed Significance Level of a Test for (p1− p2) $(p_{1} - p_{2})$

Problem

Use a statistical software package to conduct the test presented in Example 8.1. Find and interpret the p-value of the test.

Solution

We entered the sample sizes (n1 and n2) $(n_{1} and n_{2})$ and numbers of successes (x1 and x2) $(x_{1} and x_{2})$ into MINITAB and obtained the printout shown in Figure 8.2. The test statistic for this one-tailed test, z=2.37, $z = 2.37,$ as well as the p-value of the test, are highlighted on the printout. Note that p-value=.009 $p -value = .009$ is smaller than α=.05. $α = .05 .$ Consequently, we have strong evidence to reject H0 $H_{0}$ and conclude that p1 $p_{1}$ exceeds p2. $p_{2} .$

Figure 8.2

MINITAB output for test of two proportions

As with a single population proportion, most studies designed to compare two population proportions employ large samples; consequently, the large-sample testing procedure based on the normal (z) statistic presented here will be appropriate for making inferences about p1− p2 $p_{1} - p_{2}$ . However, in the case of small samples, tests for p1− p2 $p_{1} - p_{2}$ based on the z-statistic may not be valid. A test to compare proportions that can be applied to small samples utilizes Fisher’s exact test. We discuss this method in Section 8.4.

What Do You Do in the Small-Sample Case When Comparing Two Population Proportions?

Answer: Use Fisher’s exact test (see Section 8.4).

Exercises 8.1–8.23

Understanding the Principles

8.1 What conditions are required for valid large-sample inferences about p1− p2 $p_{1} - p_{2}$ ?
8.2 What is the problem with using the z-statistic to make inferences about p1− p2 $p_{1} - p_{2}$ when the sample sizes are both small?
8.3 Consider making an inference about p1− p2, $p_{1} - p_{2},$ where there are x1 $x_{1}$ successes in n1 $n_{1}$ binomial trials and x2 $x_{2}$ successes in n2 $n_{2}$ binomial trials.
1. Describe the distributions of x1 $x_{1}$ and x2. $x_{2} .$
  
  Binomial
2. For large samples, describe the sampling distribution of (pˆ1− pˆ2). $({\hat{p}}_{1} - {\hat{p}}_{2}) .$
  
  Normal

Learning the Mechanics

8.4 In each case, determine whether the sample sizes are large enough to conclude that the sampling distribution of (pˆ1− pˆ2) $({\hat{p}}_{1} - {\hat{p}}_{2})$ is approximately normal.
1. n1=10,n2=12,pˆ1=.50,pˆ2=.50 $n_{1} = 10, n_{2} = 12, {\hat{p}}_{1} = .50, {\hat{p}}_{2} = .50$
  
  No
2. n1=10,n2=12,pˆ1=.10,pˆ2=.08 $n_{1} = 10, n_{2} = 12, {\hat{p}}_{1} = .10, {\hat{p}}_{2} = .08$
  
  No
3. n1=n2=30,pˆ1=.20,pˆ2=.30 $n_{1} = n_{2} = 30, {\hat{p}}_{1} = .20, {\hat{p}}_{2} = .30$
  
  No
4. n1=100,n2=200,pˆ1=.05,pˆ2=.09 $n_{1} = 100, n_{2} = 200, {\hat{p}}_{1} = .05, {\hat{p}}_{2} = .09$
  
  No
5. n1=100,n2=200,pˆ1=.95,pˆ2=.91 $n_{1} = 100, n_{2} = 200, {\hat{p}}_{1} = .95, {\hat{p}}_{2} = .91$
  
  No
8.5 Construct a 95% confidence interval for (p1− p2) $(p_{1} - p_{2})$ in each of the following situations:
1. n1=400,pˆ1=.65;n2=400,pˆ2=.58 $n_{1} = 400, {\hat{p}}_{1} = .65; n_{2} = 400, {\hat{p}}_{2} = .58$
  
  .07±.067 $.07 \pm .067$
2. n1=180,pˆ1=.31;n2=250,pˆ2=.25 $n_{1} = 180, {\hat{p}}_{1} = .31; n_{2} = 250, {\hat{p}}_{2} = .25$
  
  .06±.086 $.06 \pm .086$
3. n1=100,pˆ1=.46;n2=120,pˆ2=.61 $n_{1} = 100, {\hat{p}}_{1} = .46; n_{2} = 120, {\hat{p}}_{2} = .61$
  
  .15±.131 $.15 \pm .131$
8.6 Independent random samples, each containing 800 observations, were selected from two binomial populations. The samples from populations 1 and 2 produced 320 and 400 successes, respectively.
1. Test H0:(p1− p2)=0 $H_{0} : (p_{1} - p_{2}) = 0$ against Ha:(p1− p2)≠0. $H_{a} : (p_{1} - p_{2}) \neq 0.$ Use α=.05. $α = .05 .$
  
  z=− 4.02 $z = - 4.02$
2. Test H0:(p1− p2)=0 $H_{0} : (p_{1} - p_{2}) = 0$ against Ha:(p1− p2)≠0. $H_{a} : (p_{1} - p_{2}) \neq 0.$ Use α=.01. $α = .01 .$
  
  z=− 4.02 $z = - 4.02$
3. Test H0:(p1− p2)=0 $H_{0} : (p_{1} - p_{2}) = 0$ against Ha:(p1− p2)<0. $H_{a} : (p_{1} - p_{2}) < 0.$ Use α=.01. $α = .01 .$
  
  z=− 4.02 $z = - 4.02$
4. Form a 90% confidence interval for (p1− p2). $(p_{1} - p_{2}) .$
8.7 Random samples of size n1=50 $n_{1} = 50$ and n2=60 $n_{2} = 60$ were drawn from populations 1 and 2, respectively. The samples yielded pˆ1=.4 ${\hat{p}}_{1} = .4$ and pˆ2=.2. ${\hat{p}}_{2} = .2 .$ Test H0:(p1− p2)=.1 $H_{0} : (p_{1} - p_{2}) = .1$ against Ha:(p1− p2)>.1, $H_{a} : (p_{1} - p_{2}) > .1,$ using α=.05. $α = .05 .$

z=1.16 $z = 1.16$
8.8 Sketch the sampling distribution of (pˆ1− pˆ2) $({\hat{p}}_{1} - {\hat{p}}_{2})$ based on independent random samples of n1=100 $n_{1} = 100$ and n2=200 $n_{2} = 200$ observations from two binomial populations with probabilities of success p1=.1 $p_{1} = .1$ and p2=.5, $p_{2} = .5,$ respectively.

Applying the Concepts—Basic

8.9 Bullying behavior study. School bullying is a form of aggressive behavior that occurs when a student is exposed repeatedly to negative actions (e.g., name-calling, hitting, kicking, spreading slander) from another student. In order to study the effectiveness of an antibullying policy at Dutch elementary schools, a survey of over 2,000 elementary school children was conducted (Health Education Research, Feb. 2005). Each student was asked if he or she ever bullied another student. In a sample of 1,358 boys, 746 claimed they had never bullied another student. In a sample of 1,379 girls, 967 claimed they had never bullied another student.
1. Estimate the true proportion of Dutch boys who have never bullied another student.
  
  .55 $.55$
2. Estimate the true proportion of Dutch girls who have never bullied another student.
  
  .70 $.70$
3. Estimate the difference in the proportions with a 90% confidence interval.
  
  − .15±.03 $- .15 \pm .03$
4. Make a statement about how likely the interval you used in part c contains the true difference in proportions.
5. Which group is more likely to bully another student, Dutch boys or Dutch girls?
  
  Boys
8.10 Is steak your favorite barbeque food? July is National Grilling Month in the United States. A Harris Poll reported on a survey of Americans’ grilling preferences. When asked about their favorite food prepared on a barbeque grill, 662 of 1,250 randomly sampled Democrats preferred steak, as compared to 586 of 930 randomly sampled Republicans.
1. Give a point estimate for the proportion of all Democrats who prefer steak as their favorite barbeque food.
  
  .53
2. Give a point estimate for the proportion of all Republicans who prefer steak as their favorite barbeque food.
  
  .63
3. Give a point estimate for the difference between the proportions of all Democrats and all Republicans who prefer steak as their favorite barbeque food.
  
  − .10 $- .10$
4. Construct a 95% confidence interval for the difference between the proportions of all Democrats and all Republicans who prefer steak as their favorite barbeque food.
  
  − .10±.042 $- .10 \pm .042$
5. Give a practical interpretation of the interval, part d.
6. Explain the meaning of the phrase “95% confident” in your answer to part e.
8.11 Hospital administration of malaria patients. One of the most serious health problems in India is malaria. Consequently, Indian hospital administrators must have the resources to treat the high volume of malaria patients that are admitted. Research published in the National Journal of Community Medicine (Vol. 1, 2010) investigated whether the malaria admission rate is higher in some months than in others. In a sample of 192 hospital patients admitted in January, 32 were treated for malaria. In an independent sample of 403 patients admitted in May (five months later), 34 were treated for malaria.
1. Describe the two populations of interest in this study.
2. Give a point estimate of the difference in the malaria admission rates in January and May.
  
  .083
3. Find a 90% confidence interval for the difference in the malaria admission rates in January and May.
  
  (.033, .133)
4. Based on the interval, part c, can you conclude that a difference exists in the true malaria admission rates in January and May? Explain.
  
  Yes
8.12 Influencing performance in a serial addition task. A classic psychological test involves adding a set of numbers (e.g., 1,000+40+1,000+30+1,000+20+1,000+10 $1, 000 + 40 + 1, 000 + 30 + 1, 000 + 20 + 1, 000 + 10$ ) to evaluate cognitive behavior. In a study published in Advances in Cognitive Psychology (Jan. 2013), undergraduate students were all given the serial addition task, with the numbers presented on a computer screen. One group (n1=60 students) $(n_{1} = 60 students)$ saw each of the numbers for 2 seconds on the screen. A second group (n2=60 students) $(n_{2} = 60 students)$ saw all the numbers on the screen at the same time, with the number 1,000 presented in bright red. The correct answer, of course, is 4,100. In the first group, 17 answered correctly; in the second group, 12 answered correctly.
1. Compute the proportion of students in Group 1 that answered correctly.
  
  .283
2. Compute the proportion of students in Group 2 that answered correctly.
  
  .20
3. Why is a statistical test of hypothesis (or confidence interval) required to compare the sample proportions, parts a and b?
4. Conduct a test (atα=.01) $(a t α = .01)$ to compare the proportions. What inference can the researchers make regarding the two serial addition tasks?
  
  z = $=$ 1.06, fail to reject H0 $reject H_{0}$
8.13 Web survey response rates. Response rates to Web surveys are typically low, partially due to users starting but not finishing the survey. The factors that influence response rates were investigated in Survey Methodology (Dec. 2013). In a designed study, Web users were directed to participate in one of several surveys with different formats. For example, one format utilized a welcome screen with a white background, and another format utilized a welcome screen with a red background. The “break-off rates,” i.e., the proportion of sampled users who break off the survey before completing all questions, for the two formats are provided in the table.

White Welcome Screen Red Welcome Screen

Number of Web users 190 183

Number who break off survey 49 37

Break-off rate .258 .202

Source: Haer, R., and Meidert, N. “Does the first impression count? Examining the effect of the welcome screen design on the response rate.” Survey Methodology, Vol. 39, No. 2, Dec. 2013 (Table 4.1).
1. Verify the values of the break-off rates shown in the table.
2. The researchers theorize that the true break-off rate for Web users of the red welcome screen will be lower than the corresponding break-off rate for users of the white welcome screen. Give the null and alternative hypothesis for testing this theory.
  
  H_a: p1− p2>0 $p_{1} - p_{2} > 0$
3. Conduct the test, part b, at α=.10 $α = .10$ . What do you conclude?
  
  z = $=$ 1.27, fail to reject H0 $reject H_{0}$
8.14 Planning-habits survey. American Demographics (Jan. 2002) reported the results of a survey on the planning habits of men and women. In response to the question “What is your preferred method of planning and keeping track of meetings, appointments, and deadlines?” 56% of the men and 46% of the women answered “I keep them in my head.” A nationally representative sample of 1,000 adults participated in the survey; therefore, assume that 500 were men and 500 were women.
1. Set up the null and alternative hypotheses for testing whether the percentage of men who prefer keeping track of appointments in their head is larger than the corresponding percentage of women.
  
  H0:p1=p2 $H_{0} : p_{1} = p_{2}$ ; Ha:p1>p2 $H_{a} : p_{1} > p_{2}$
2. Compute the test statistic for the test.
  
  z=3.16 $z = 3.16$
3. Give the rejection region for the test, using α=.01. $α = .01 .$
4. Find the p-value for the test.
  
  p≈0 $p \approx 0$
5. Draw the appropriate conclusion.
  
  Reject H0 $Reject H_{0}$

	White Welcome Screen	Red Welcome Screen
Number of Web users	190	183
Number who break off survey	49	37
Break-off rate	.258	.202

Applying the Concepts—Intermediate

8.15 Salmonella in produce. Salmonella is the most common type of bacterial food-borne illness in the United States. How prevalent is salmonella in produce grown in the major agricultural region of Monterey, California? Researchers from the United States Department of Agriculture (USDA) conducted tests for salmonella in produce grown in the region and published their results in Applied and Environmental Microbiology (Apr. 2011). In a sample of 252 cultures obtained from water used to irrigate the region, 18 tested positive for salmonella. In an independent sample of 476 cultures obtained from the region’s wildlife (e.g., birds), 20 tested positive for salmonella. Is this sufficient evidence for the USDA to state that the prevalence of salmonella in the region’s water differs from the prevalence of salmonella in the region’s wildlife? Use α=.01 $α = .01$ to make your decision.

No,z=1.70 $No, z = 1.70$
8.16 Study of armyworm pheromones. A study was conducted to determine the effectiveness of pheromones produced by two different strains of fall armyworms: the corn-strain and the rice-strain (Journal of Chemical Ecology, Mar. 2013). Both corn-strain and rice-strain male armyworms were released into a field containing a synthetic pheromone made from a corn-strain blend. A count of the number of males trapped by the pheromone was then determined. The experiment was conducted once in a corn field and then again in a grass field. The results are provided in the accompanying table.
1. Consider the corn field results. Construct a 90% confidence interval for the difference between the proportions of corn-strain and rice-strain males trapped by the pheromone.
  
  (.062,.248) $(.062, .248)$
2. Consider the grass field results. Construct a 90% confidence interval for the difference between the proportions of corn-strain and rice-strain males trapped by the pheromone.
  
  (.145,.259) $(.145, .259)$
3. Based on the confidence intervals, parts a and b, what can you conclude about the effectiveness of a corn-blend synthetic pheromone placed in a corn field? A grass field?
4. The researchers also want to compare the proportion of corn-strain males trapped in the corn field to the proportion of corn-strain males trapped in the grass field. Carry out this comparison using a hypothesis test (at α=.10 $α = .10$ ). What inference can you draw from the data?
5. Repeat part d for the proportions of rice-strain males trapped by the pheromone.
  
  z = $=$ 1.16, fail to reject H0 $reject H_{0}$
  
  Corn Field Grass Field
  
  Number of corn-strain males released 112 215
  
  Number trapped 86 164
  
  Number of rice-strain males released 150 669
  
  Number trapped 92 375
8.17 Traffic sign maintenance. The Federal Highway Administration (FHWA) recently issued new guidelines for maintaining and replacing traffic signs. Civil engineers at North Carolina State University conducted a study of the effectiveness of various sign maintenance practices developed to adhere to the new guidelines and published the results in the Journal of Transportation Engineering (June 2013). One portion of the study focused on the proportion of traffic signs that fail the minimum FHWA retroreflectivity requirements. Of 1,000 signs maintained by the North Carolina Department of Transportation (NCDOT), 512 were deemed failures. Of 1,000 signs maintained by county-owned roads in North Carolina, 328 were deemed failures. Conduct a test of hypothesis to determine whether the true proportions of traffic signs that fail the minimum FHWA retroreflectivity requirements differ depending on whether the signs are maintained by the NCDOT or by the county. Test using α=.05 $α = .05$ .

z = $=$ − $-$ 8.34, reject H0 $reject H_{0}$
8.18 Angioplasty’s benefits challenged. Each year, more than 1 million heart patients undergo an angioplasty. The benefits of an angioplasty were challenged in a recent study of 2,287 patients (2007 Annual Conference of the American College of Cardiology, New Orleans). All the patients had substantial blockage of the arteries, but were medically stable. All were treated with medication such as aspirin and beta-blockers. However, half the patients were randomly assigned to get an angioplasty and half were not. After five years, the researchers found that 211 of the 1,145 patients in the angioplasty group had subsequent heart attacks, compared with 202 of 1,142 patients in the medication-only group. Do you agree with the study’s conclusion that “There was no significant difference in the rate of heart attacks for the two groups”? Support your answer with a 95% confidence interval.

.0074±.03153 $.0074 \pm .03153$
8.19 “Tip-of-the-tongue” study. Trying to think of a word you know, but can’t instantly retrieve, is called the “tip-of-the-tongue” phenomenon. Psychology and Aging (Sept. 2001) published a study of this phenomenon in senior citizens. The researchers compared 40 people between 60 and 72 years of age with 40 between 73 and 83 years of age. When primed with the initial syllable of a missing word (e.g., seeing the word include to help recall the word incisor), the younger seniors had a higher recall rate. Suppose 31 of the 40 seniors in the younger group could recall the word when primevd with the initial syllable, while only 22 of the 40 seniors could recall the word. Compare the recall rates of the two groups, using α=.05. $α = .05 .$ Does one group of elderly people have a significantly higher recall rate than the other?

z=2.13 $z = 2.13$
8.20 Vulnerability of relying party Web sites. When you sign on to your Facebook account, you are granted access to more than 1 million relying party (RP) Web sites. This single sign-on (SSO) scheme is enabled by OAuth 2.0, an open and standardized Web resource authorization protocol. Although the protocol claims to be secure, there is anecdotal evidence of critical vulnerabilities that allow an attacker to gain unauthorized access to the user’s profile and allow the attacker to impersonate the victim on the RP Web site. Computer and systems engineers at the University of British Columbia investigated the vulnerability of relying party Web sites and presented their results at the Proceedings of the 5th AMC Workshop on Computers & Communication Security (Oct. 2012). RP Web sites were categorized as server-flow or client-flow Web sites. Of the 40 server-flow sites studied, 20 were found to be vulnerable to impersonation attacks. Of the 54 client-flow sites examined, 41 were found to be vulnerable to impersonation attacks. Do these results indicate that a client-flow Web site is more likely to be vulnerable to an impersonation attack than a client-flow Web site? Test using α=.01 $α = .01$ .

Yes, z = $=$ − $-$ 2.6
8.21 Does sleep improve mental performance? Are creativity and problem solving linked to adequate sleep? This question was the subject of research conducted by German scientists at the University of Lübeck (Nature, Jan. 22, 2004). One hundred volunteers were divided into two equal-sized groups. Each volunteer took a math test that involved transforming strings of eight digits into a new string that fit a set of given rules, as well as a third, hidden rule. Prior to taking the test, one group received eight hours of sleep, while the other group stayed awake all night. The scientists monitored the volunteers to determine whether and when they figured out the third rule. Of the volunteers who slept, 39 discovered the third rule; of the volunteers who stayed awake all night, 15 discovered the third rule. From the study results, what can you infer about the proportions of volunteers in the two groups who discover the third rule? Support your answer with a 90% confidence interval.

.48±.144 $.48 \pm .144$

	Corn Field	Grass Field
Number of corn-strain males released	112	215
Number trapped	86	164
Number of rice-strain males released	150	669
Number trapped	92	375

Applying the Concepts—Advanced

8.22 Religious symbolism in TV commercials. Gonzaga University professors conducted a study of television commercials and published their results in the Journal of Sociology, Social Work, and Social Welfare (Vol. 2, 2008). The key research question was: “Do television advertisers use religious symbolism to sell goods and services?” In a sample of 797 TV commercials collected in 1998, only 16 commercials used religious symbolism. Of the sample of 1,499 TV commercials examined in the more recent study, 51 commercials used religious symbolism. Conduct an analysis to determine if the percentage of TV commercials that use religious symbolism has changed since the 1998 study. If you detect a change, estimate the magnitude of the difference and attach a measure of reliability to the estimate.

z=−1.90; $z = - 1.90;$ do not reject H0 $H_{0}$
8.23 Teeth defects and stress in prehistoric Japan. Linear enamel hypoplasia (LEH) defects are pits or grooves on the tooth surface that are typically caused by malnutrition, chronic infection, stress, and trauma. A study of LEH defects in prehistoric Japanese cultures was published in the American Journal of Physical Anthropology (May 2010). Three groups of Japanese people were studied: Yayoi farmers (early agriculturists), eastern Jomon foragers (broad-based economy), and western Jomon foragers (wet rice economy). LEH defect prevalence was determined from skulls of individuals obtained from each of the three cultures. Of the 182 Yayoi farmers in the study, 63.1% had at least one LEH defect; of the 164 Eastern Jomon foragers, 48.2% had at least one LEH defect; and, of the 122 Western Jomon foragers, 64.8% had at least one LEH defect. Two theories were tested. Theory 1 states that foragers with a broad-based economy will have a lower LEH defect prevalence than early agriculturists. Theory 2 states that foragers with a wet rice economy will not differ in LEH defect prevalence from early agriculturists. Use the results to test both theories, each at α=.01 $α = .01$ .

Based on Temple, D. H. “Patterns of systemic stress during the agricultural transition in prehistoric Japan.” American Journal of Physical Anthropology, Vol. 142, No. 1, May 2010.

Theory 1:z=2.79, reject H0; $Theory 1 : z = 2.79, reject H_{0};$

Theory 2:z=− .30, do not reject H0; $Theory 2 : z = - .30, do not reject H_{0};$

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 8.1 Comparing Two Population Proportions: Independent Sampling

Create new playlist

Sign In

Sign Up

Table of Contents for
8.1 Comparing Two Population Proportions: Independent Sampling