Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

8.2 Determining the Sample Size

The sample sizes n₁ and n₂ required to compare two population proportions can be found in a manner similar to the method described in Section 7.4 for comparing two population means. We will assume equal sized samples (i.e., $n_{1} = n_{2} = n$ $n_{1} = n_{2} = n$ ) and then choose n so that $({\hat{p}}_{1} - {\hat{p}}_{2})$ $({\hat{p}}_{1} - {\hat{p}}_{2})$ will differ from $(p_{1} - p_{2})$ $(p_{1} - p_{2})$ by no more than a sampling error SE with a specified level of confidence. We will illustrate the procedure with an example.

Example 8.3 Finding The Sample Sizes For Estimating $(p_{1} - p_{2})$ $(p_{1} - p_{2})$ : Comparing Defect Rates of Two Machines

Problem

A production supervisor suspects that a difference exists between the proportions p₁ and p₂ of defective items produced by two different machines. Experience has shown that the proportion defective for each of the two machines is in the neighborhood of .03. If the supervisor wants to estimate the difference in the proportions to within .005, using a 95% confidence interval, how many items must be randomly sampled from the output produced by each machine? (Assume that the supervisor wants $n_{1} = n_{2} = n$ $n_{1} = n_{2} = n$ .)

Solution

In this sampling problem, the sampling error $S E = .005$ $S E = .005$ , and for the specified level of reliability, $z_{α / 2} = z_{0.25} = 1.96$ $z_{α / 2} = z_{0.25} = 1.96$ . Then, letting $p_{1} = p_{2} = .03$ $p_{1} = p_{2} = .03$ and $n_{1} = n_{2} = n$ $n_{1} = n_{2} = n$ , we find the required sample size per machine by solving the following equation for n:

$z_{α / 2} σ_{({\hat{p}}_{1} - {\hat{p}}_{2})} = S E$ $z_{α / 2} σ_{({\hat{p}}_{1} - {\hat{p}}_{2})} = S E$

or

$\begin{array}{l} z_{α / 2} \sqrt{\frac{p_{1} q_{1}}{n_{1}} + \frac{p_{2} q_{2}}{n_{2}}} & = & SE \\ 1.96 \sqrt{\frac{(0.3) (.97)}{n} + \frac{(0.3) (.97)}{n}} & = & .005 \\ 1.96 \sqrt{\frac{2 (0.3) (.97)}{n}} & = & .005 \\ n & = & 8, 943.2 \end{array}$ $\begin{array}{l} z_{α / 2} \sqrt{\frac{p_{1} q_{1}}{n_{1}} + \frac{p_{2} q_{2}}{n_{2}}} & = & SE \\ 1.96 \sqrt{\frac{(0.3) (.97)}{n} + \frac{(0.3) (.97)}{n}} & = & .005 \\ 1.96 \sqrt{\frac{2 (0.3) (.97)}{n}} & = & .005 \\ n & = & 8, 943.2 \end{array}$

Look Back

This large n will likely result in a tedious sampling procedure. If the supervisor insists on estimating $(p_{1} - p_{2})$ $(p_{1} - p_{2})$ correct to within .005 with 95% confidence, approximately 9,000 items will have to be inspected for each machine.

Now Work Exercise 8.26a

You can see from the calculations in Example 8.3 that $σ_{({\hat{p}}_{_{1}} - {\hat{p}}_{2})}$ $σ_{({\hat{p}}_{_{1}} - {\hat{p}}_{2})}$ (and hence the solution, $n_{1} = n_{2} = n$ $n_{1} = n_{2} = n$ ) depends on the actual (but unknown) values of p₁ and p₂. In fact, the required sample size $n_{1} = n_{2} = n$ $n_{1} = n_{2} = n$ is largest when $p_{1} = p_{2} = .5$ $p_{1} = p_{2} = .5$ . Therefore, if you have no prior information on the approximate values of p₁ and p₂, use $p_{1} = p_{2} = .5$ $p_{1} = p_{2} = .5$ in the formula for $σ_{({\hat{p}}_{_{1}} - {\hat{p}}_{2})}$ $σ_{({\hat{p}}_{_{1}} - {\hat{p}}_{2})}$ . If p₁ and p₂ are in fact close to .5, then the values of n₁ and n₂ that you have calculated will be correct. If p₁ and p₂ differ substantially from .5, then your solutions for n₁ and n₂ will be larger than needed. Consequently, using $p_{1} = p_{2} = .5$ $p_{1} = p_{2} = .5$ when solving for n₁ and n₂ is a conservative procedure because the sample sizes n₁ and n₂ will be at least as large as (and probably larger than) needed.

The procedure for determining sample sizes necessary for estimating ( $p_{1} - p_{2}$ $p_{1} - p_{2}$ ) for the equal sample size ( $n_{1} = n_{2}$ $n_{1} = n_{2}$ ) case and the unequal sample size case (e.g., $n_{2} = a n_{1}$ $n_{2} = a n_{1}$ ) are given in the following boxes:

Determination of Sample Size for Estimating $p_{1} - p_{2}$ $p_{1} - p_{2}$ : Equal Sample Size Case

To estimate ( $p_{1} - p_{2}$ $p_{1} - p_{2}$ ) to within a given sampling error SE and with confidence level ( $1 - α$ $1 - α$ ), use the following formula to solve for equal sample sizes that will achieve the desired reliability:

n_{1} = n_{2} = \frac{{(z_{α / 2})}^{2} (p_{1} q_{1} + p_{2} q_{2})}{{(SE)}^{2}}

$n_{1} = n_{2} = \frac{{(z_{α / 2})}^{2} (p_{1} q_{1} + p_{2} q_{2})}{{(SE)}^{2}}$

You will need to substitute estimates for the values of p₁ and p₂ before solving for the sample size. These estimates might be based on prior samples, obtained from educated guesses or, most conservatively, specified as $p_{1} = p_{2} = .5$ $p_{1} = p_{2} = .5$ .

Adjustment to Sample Size Formula for Estimating $(p_{1} - p_{2})$ $(p_{1} - p_{2})$ When $n_{2} = a (n_{1})$ $n_{2} = a (n_{1})$

\begin{array}{l} n_{1} = \frac{{(z_{α / 2})}^{2} (a p_{1} q_{1} + p_{2} q_{2})}{a {(SE)}^{2}} & n_{2} = a (n_{1}) \end{array}

$\begin{array}{l} n_{1} = \frac{{(z_{α / 2})}^{2} (a p_{1} q_{1} + p_{2} q_{2})}{a {(SE)}^{2}} & n_{2} = a (n_{1}) \end{array}$

Exercises 8.24–8.33

Understanding the Principles

8.24 In determining the sample sizes for estimating $p_{1} - p_{2}$ $p_{1} - p_{2}$ , how do you obtain estimates of the binomial proportions p₁ and p₂ used in the calculations?
8.25 If the sample-size calculation yields a value of n that is too large to be practical, how should you proceed?

Learning the Mechanics

8.26 Assuming that $n_{1} = n_{2}$ $n_{1} = n_{2}$ , find the sample sizes needed to estimate ( $p_{1} - p_{2}$ $p_{1} - p_{2}$ ) for each of the following situations:
1. $S E = .01$ $S E = .01$ with 99% confidence. Assume that $p_{1} \approx .4$ $p_{1} \approx .4$ and $p_{2} \approx .7$ $p_{2} \approx .7$ .
  
  n₁ $=$ $=$ n₂ $=$ $=$ 29,954
2. A 90% confidence interval of width .05. Assume there is no prior information available with which to obtain approximate values of p₁ and p₂.
  
  n₁ $=$ $=$ n₂ $=$ $=$ 2165
3. $S E = .03$ $S E = .03$ with 90% confidence. Assume that $p_{1} \approx .2$ $p_{1} \approx .2$ and $p_{2} \approx .3$ $p_{2} \approx .3$ .
  
  n₁ $=$ $=$ n₂ $=$ $=$ 1113
8.27 Enough money has been budgeted to collect independent random samples of size $n_{1} = n_{2} = 100$ $n_{1} = n_{2} = 100$ from populations 1 and 2 in order to estimate ( $p_{1} - p_{2}$ $p_{1} - p_{2}$ ). Prior information indicates that $p_{1} = p_{2} \approx .6$ $p_{1} = p_{2} \approx .6$ . Have sufficient funds been allocated to construct a 90% confidence interval for ( $p_{1} - p_{2}$ $p_{1} - p_{2}$ ) of width .1 or less? Justify your answer.

n₁ $=$ $=$ n₂ $=$ $=$ 520

Applying the Concepts—Basic

8.28 Size of a political poll. A pollster wants to estimate the difference between the proportions of men and women who favor a particular national candidate using a 90% confidence interval of width .04. Suppose the pollster has no prior information about the proportions. If equal numbers of men and women are to be polled, how large should the sample sizes be?

n₁ $=$ $=$ n₂ $=$ $=$ 3383
8.29 Angioplasty’s benefits challenged. Refer to the study of patients with substantial blockage of the arteries presented at the 2007 Annual Conference of the American College of Cardiology, Exercise 8.18 (p. 457). Recall that half the patients were randomly assigned to get an angioplasty and half were not. The researchers compared the proportion of patients with subsequent heart attacks for the two groups and reported no significant difference between the two proportions. Although the study involved over 2,000 patients, the sample size may have been too small to detect a difference in heart attack rates.
1. How many patients must be sampled in each group in order to estimate the difference in heart attack rates to within .015 with 95% confidence? (Use summary data from Exercise 8.18 in your calculation.)
  
  n₁ $=$ $=$ n₂ $=$ $=$ 5051
2. Comment on the practicality of carrying out the study with the sample sizes determined in part a.
3. Comment on the practical significance of the difference detected in the confidence interval for the study, part a.
8.30 Influencing performance in a serial addition task. Refer to the Advances in Cognitive Psychology (Jan. 2013) study of influencing performance in a classic psychological test that involved adding a set of numbers, Exercise 8.12 (p. 456). Recall that one group of students saw each of the numbers for 2 seconds on the screen while a second group saw all the numbers on the screen at the same time, with the number 1,000 presented in bright red. The researchers want to estimate the true difference in the proportions of students in the two groups that give the correct answer with a 95% confidence interval. They want to know how many students to sample in each group in order to obtain an estimate that is no more than .10 from the true difference.
1. Identify the parameter of interest for this study.
  
  p₁ $-$ $-$ p₂
2. What is the desired confidence level?
  
  .95
3. What is the desired sampling error?
  
  .10
4. Find the sample size required to obtain the desired estimate. Assume $n_{1} = n_{2}$ $n_{1} = n_{2}$ .
  
  140
5. Repeat part d, but now assume that twice as many students will be sampled in the first group as in the second group, i.e., assume $n_{1} = 2 n_{2} .$ $n_{1} = 2 n_{2} .$
  
  n₂ $=$ $=$ 101

Applying the Concepts—Intermediate

8.31 Buyers of TVs. A manufacturer of large-screen televisions wants to compare with a competitor the proportions of its best sets that need repair within 1 year. If it is desired to estimate the difference between proportions to within .05 with 90% confidence, and if the manufacturer plans to sample twice as many buyers (n₁) of its sets as buyers (n₂) of the competitor’s sets, how many buyers of each brand must be sample? Assume that the propotion of sets that need repair will be about .2 for both brands.
8.32 Cable-TV home shoppers. All cable television companies carry at least one home-shopping channel. Who uses these home-shopping services? Are the shoppers primarily men or women? Suppose you want to estimate the difference in the percentages of men and women who say they have used or expect to use televised home shopping. You want an 80% confidence interval of width .06 or less.
1. Approximately how many people should be included in your samples?
  
  n₁ $=$ $=$ 520, n₂ $=$ $=$ 260
2. Suppose you want to obtain individual estimates for the two percentages of interest. Will the sample size found in part a be large enough to provide estimates of each percentage correct to within .02 with probability equal to .90? Justify your response.
  
  No
8.33 Rat damage in sugarcane. Poisons are used to prevent rat damage in sugarcane fields. The U.S. Department of Agriculture is investigating whether the rat poison should be located in the middle of the field or on the outer perimeter. One way to answer this question is to determine where the greater amount of damage occurs. If damage is measured by the proportion of cane stalks that have been damaged by rats, how many stalks from each section of the field should be sampled in order to estimate the true difference between proportions of stalks damaged in the two sections, to within .02 with 95% confidence?

4802

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 8.2 Determining the Sample Size

Create new playlist

Sign In

Sign Up

Table of Contents for
8.2 Determining the Sample Size