8.2 Determining the Sample Size

The sample sizes n1 and n2 required to compare two population proportions can be found in a manner similar to the method described in Section 7.4 for comparing two population means. We will assume equal sized samples (i.e., n1=n2=n) and then choose n so that (p^1− p^2) will differ from (p1− p2) by no more than a sampling error SE with a specified level of confidence. We will illustrate the procedure with an example.

Example 8.3 Finding The Sample Sizes For Estimating (p1p2): Comparing Defect Rates of Two Machines

Problem

  1. A production supervisor suspects that a difference exists between the proportions p1 and p2 of defective items produced by two different machines. Experience has shown that the proportion defective for each of the two machines is in the neighborhood of .03. If the supervisor wants to estimate the difference in the proportions to within .005, using a 95% confidence interval, how many items must be randomly sampled from the output produced by each machine? (Assume that the supervisor wants n1=n2=n.)

Solution

  1. In this sampling problem, the sampling error SE=.005, and for the specified level of reliability, zα/2=z0.25=1.96. Then, letting p1=p2=.03 and n1=n2=n, we find the required sample size per machine by solving the following equation for n:

    zα/2σ(p^1− p^2)=SE

    or

    zα/2p1q1n1+p2q2n2=SE1.96(0.3)(.97)n+(0.3)(.97)n=.0051.962(0.3)(.97)n=.005n=8,943.2

Look Back

This large n will likely result in a tedious sampling procedure. If the supervisor insists on estimating (p1− p2) correct to within .005 with 95% confidence, approximately 9,000 items will have to be inspected for each machine.

Now Work Exercise 8.26a

You can see from the calculations in Example 8.3 that σ(p^1− p^2) (and hence the solution, n1=n2=n) depends on the actual (but unknown) values of p1 and p2. In fact, the required sample size n1=n2=n is largest when p1=p2=.5 . Therefore, if you have no prior information on the approximate values of p1 and p2, use p1=p2=.5 in the formula for σ(p^1− p^2). If p1 and p2 are in fact close to .5, then the values of n1 and n2 that you have calculated will be correct. If p1 and p2 differ substantially from .5, then your solutions for n1 and n2 will be larger than needed. Consequently, using p1=p2=.5 when solving for n1 and n2 is a conservative procedure because the sample sizes n1 and n2 will be at least as large as (and probably larger than) needed.

The procedure for determining sample sizes necessary for estimating ( p1− p2) for the equal sample size ( n1=n2) case and the unequal sample size case (e.g., n2=an1) are given in the following boxes:

Determination of Sample Size for Estimating  p1 p2: Equal Sample Size Case

To estimate ( p1− p2) to within a given sampling error SE and with confidence level ( 1− α), use the following formula to solve for equal sample sizes that will achieve the desired reliability:

n1=n2=(zα/2)2(p1q1+p2q2)(SE)2

You will need to substitute estimates for the values of p1 and p2 before solving for the sample size. These estimates might be based on prior samples, obtained from educated guesses or, most conservatively, specified as p1=p2=.5.

Adjustment to Sample Size Formula for Estimating (p1− p2) When  n2=a(n1)

n1=(zα/2)2(ap1q1+p2q2)a(SE)2n2=a(n1)

Exercises 8.24–8.33

Understanding the Principles

  1. 8.24 In determining the sample sizes for estimating p1− p2, how do you obtain estimates of the binomial proportions p1 and p2 used in the calculations?

  2. 8.25 If the sample-size calculation yields a value of n that is too large to be practical, how should you proceed?

Learning the Mechanics

  1. 8.26 Assuming that n1=n2, find the sample sizes needed to estimate ( p1− p2) for each of the following situations:

    1. SE=.01 with 99% confidence. Assume that p1.4 and p2.7.

    2. A 90% confidence interval of width .05. Assume there is no prior information available with which to obtain approximate values of p1 and p2.

    3. SE=.03 with 90% confidence. Assume that p1.2 and p2.3.

  2. 8.27 Enough money has been budgeted to collect independent random samples of size n1=n2=100 from populations 1 and 2 in order to estimate ( p1− p2). Prior information indicates that p1=p2.6. Have sufficient funds been allocated to construct a 90% confidence interval for ( p1− p2) of width .1 or less? Justify your answer.

Applying the Concepts—Basic

  1. 8.28 Size of a political poll. A pollster wants to estimate the difference between the proportions of men and women who favor a particular national candidate using a 90% confidence interval of width .04. Suppose the pollster has no prior information about the proportions. If equal numbers of men and women are to be polled, how large should the sample sizes be?

  2. 8.29 Angioplasty’s benefits challenged. Refer to the study of patients with substantial blockage of the arteries presented at the 2007 Annual Conference of the American College of Cardiology, Exercise 8.18 (p. 457). Recall that half the patients were randomly assigned to get an angioplasty and half were not. The researchers compared the proportion of patients with subsequent heart attacks for the two groups and reported no significant difference between the two proportions. Although the study involved over 2,000 patients, the sample size may have been too small to detect a difference in heart attack rates.

    1. How many patients must be sampled in each group in order to estimate the difference in heart attack rates to within .015 with 95% confidence? (Use summary data from Exercise 8.18 in your calculation.)

    2. Comment on the practicality of carrying out the study with the sample sizes determined in part a.

    3. Comment on the practical significance of the difference detected in the confidence interval for the study, part a.

  3. 8.30 Influencing performance in a serial addition task. Refer to the Advances in Cognitive Psychology (Jan. 2013) study of influencing performance in a classic psychological test that involved adding a set of numbers, Exercise 8.12 (p. 456). Recall that one group of students saw each of the numbers for 2 seconds on the screen while a second group saw all the numbers on the screen at the same time, with the number 1,000 presented in bright red. The researchers want to estimate the true difference in the proportions of students in the two groups that give the correct answer with a 95% confidence interval. They want to know how many students to sample in each group in order to obtain an estimate that is no more than .10 from the true difference.

    1. Identify the parameter of interest for this study.

    2. What is the desired confidence level?

    3. What is the desired sampling error?

    4. Find the sample size required to obtain the desired estimate. Assume n1=n2.

    5. Repeat part d, but now assume that twice as many students will be sampled in the first group as in the second group, i.e., assume n1=2n2.

Applying the Concepts—Intermediate

  1. 8.31 Buyers of TVs. A manufacturer of large-screen televisions wants to compare with a competitor the proportions of its best sets that need repair within 1 year. If it is desired to estimate the difference between proportions to within .05 with 90% confidence, and if the manufacturer plans to sample twice as many buyers (n1) of its sets as buyers (n2) of the competitor’s sets, how many buyers of each brand must be sample? Assume that the propotion of sets that need repair will be about .2 for both brands.

  2. 8.32 Cable-TV home shoppers. All cable television companies carry at least one home-shopping channel. Who uses these home-shopping services? Are the shoppers primarily men or women? Suppose you want to estimate the difference in the percentages of men and women who say they have used or expect to use televised home shopping. You want an 80% confidence interval of width .06 or less.

    1. Approximately how many people should be included in your samples?

    2. Suppose you want to obtain individual estimates for the two percentages of interest. Will the sample size found in part a be large enough to provide estimates of each percentage correct to within .02 with probability equal to .90? Justify your response.

  3. 8.33 Rat damage in sugarcane. Poisons are used to prevent rat damage in sugarcane fields. The U.S. Department of Agriculture is investigating whether the rat poison should be located in the middle of the field or on the outer perimeter. One way to answer this question is to determine where the greater amount of damage occurs. If damage is measured by the proportion of cane stalks that have been damaged by rats, how many stalks from each section of the field should be sampled in order to estimate the true difference between proportions of stalks damaged in the two sections, to within .02 with 95% confidence?

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset