You can find the appropriate sample size to estimate the difference between a pair of parameters with a specified sampling error (SE) and degree of reliability by using the method described in Section 5.5. That is, to estimate the difference between a pair of parameters correct to within SE units with confidence level let standard deviations of the sampling distribution of the estimator equal SE. Then solve for the sample size. To do this, you have to solve the problem for a specific ratio between and Most often, you will want to have equal sample sizes—that is, We will illustrate the procedure with two examples.
New fertilizer compounds are often advertised with the promise of increased crop yields. Suppose we want to compare the mean yield of wheat when a new fertilizer is used with the mean yield from a fertilizer in common use. The estimate of the difference in mean yield per acre is to be correct to within .25 bushel with a confidence coefficient of .95. If the sample sizes are to be equal, find the number of 1-acre plots of wheat assigned to each fertilizer.
To solve the problem, you need to know something about the variation in the bushels of yield per acre. Suppose that, from past records, you know that the yields of wheat possess a range of approximately 10 bushels per acre. You could then approximate by letting the range equal Thus,
The next step is to solve the equation
for n, where Since we want our estimate to lie within of with confidence coefficient equal to .95, we have Then, letting and solving for n, we get
Consequently, you will have to sample 769 acres of wheat for each fertilizer to estimate the difference in mean yield per acre to within .25 bushel.
Since would necessitate extensive and costly experimentation, you might decide to allow a larger sampling error (say, or ) in order to reduce the sample size, or you might decrease the confidence coefficient. The point is that we can obtain an idea of the experimental effort necessary to achieve a specified precision in our final estimate by determining the approximate sample size before the experiment is begun.
Now Work Exercise 7.56
A laboratory manager wishes to compare the difference in the mean reading of two instruments, A and B, designed to measure the potency (in parts per million) of an antibiotic. To conduct the experiment, the manager plans to select nd specimens of the antibiotic from a vat and to measure each specimen with both instruments. The difference will be estimated based on the nd paired differences obtained in the experiment. If preliminary measurements suggest that the differences will range between plus or minus 10 parts per million, how many differences will be needed to estimate correct to within 1 part per million with confidence equal to .99?
The estimator for , based on a paired difference experiment, is and
Thus, the number nd of pairs of measurements needed to estimate to within 1 part per million can be obtained by solving for nd in the equation
where and . To solve this equation for nd, we need to have an approximate value for .
We are given the information that the differences are expected to range from to 10 parts per million. Letting the range equal , we find
Substituting , , and into the equation and solving for nd, we obtain
Therefore, it will require pairs of measurements to estimate correct to within 1 part per million using the paried difference experiment.
Now Work Exercise 7.68
The procedures for determining sample sizes necessary for estimating for the case and for in a paired difference experiment are given in the following boxes:
To estimate to within a given sampling error SE and with confidence level use the following formula to solve for equal sample sizes that will achieve the desired reliability:
You will need to substitute estimates for the values of and before solving for the sample size. These estimates might be sample variances and from prior sampling (e.g., a pilot study) or from an educated (and conservatively large) guess based on the range—that is, .
To estimate to within a given sampling error SE and with confidence level , use the following formula to solve for the number of pairs, , that will achieve the desired reliability:
You will need to substitute an estimate for the value of , the standard deviation of the paired differences, before solving for the sample size.
Note: When estimating , you may desire one sample size to be a multiple of the other, e.g., ) where a is an integer. For example, you may want to sample twice as many experimental units in the second sample as in the first. Then and . For this unequal sample size case, slight adjustments are made to the computing formula. This formula (proof omitted) is provided below for convenience.
7.53 In determining the sample sizes for estimating how do you obtain estimates of the population variances and used in the calculations?
7.54 When determining the sample size for estimating , how do you obtain an estimate of the population variance used in the calculations?
7.55 If the sample-size calculation yields a value of n that is too large to be practical, how should you proceed?
7.56 Find the appropriate values of and (assume that ) needed to estimate with
A sampling error equal to 3.2 with 95% confidence. From prior experience, it is known that and
A sampling error equal to 8 with 99% confidence. The range of each population is 60.
A 90% confidence interval of width 1.0. Assume that and
7.57 Suppose you want to estimate the difference between two population means correct to within 2.2 with probability .95. If prior information suggests that the population variances are approximately equal to and you want to select independent random samples of equal size from the populations, how large should the sample sizes, and be?
7.58 Enough money has been budgeted to collect paired observations from populations 1 and 2 in order to estimate . Prior information indicates that . Have sufficient funds been allocated to construct a 90% confidence interval for of width 5 or less? Justify your answer.
7.59 Hygiene of handshakes, high fives, and fist bumps. Refer to the American Journal of Infection Control (Aug. 2014) study of the hygiene of hand greetings, Exercise 7.22 (p. 385). The number of bacteria transferred from a gloved hand dipped into a culture of bacteria to a second gloved hand contacted by either a handshake, high five, or fist bump was recorded. Recall that the experiment was replicated only five times for each contact method and the data used to compare the mean percentage of bacteria transferred for any two contact methods. Therefore, for this independent-samples design, . Suppose you want to estimate the difference between the mean percentage of bacteria transferred for the handshake and fist bump greetings to within 10% using a 95% confidence interval.
Define the parameter of interest in this study.
Give the value of for the confidence interval.
What is the desired sampling error, SE?
From the data provided in Exercise 7.22, find estimates of the variances, and , for the two contact methods.
Use the information in parts a–d to calculate the number of replicates for each contact method required to obtain the desired reliability. Assume an equal number of replicates.
7.60 Laughter among deaf signers. Refer to the Journal of Deaf Studies and Deaf Education (Fall 2006) paired difference study on vocalized laughter among deaf users of sign language, presented in Exercise 7.42 (p. 396). Suppose you want to estimate , the difference between the population mean number of laugh episodes of deaf speakers and deaf audience members, using a 90% confidence interval with a sampling error of .75. Find the number of pairs of deaf people required to obtain such an estimate, assuming that the variance of the paired differences is .
7.61 Bulimia study. Refer to the American Statistician (May 2001) study comparing the “fear of negative evaluation” (FNE) scores for bulimic and normal female students, presented in Exercise 7.17 (p. 383). Suppose you want to estimate the difference between the population means of the FNE scores for bulimic and normal female students, using a 95% confidence interval with a sampling error of two points. Find the sample sizes required to obtain such an estimate. Assume equal sample sizes of
7.62 Last name and acquisition timing. Refer to the Journal of Consumer Research (Aug. 2011) study of the last name effect in acquisition timing, Exercise 7.12 (p. 382). Recall that the mean response times (in minutes) to acquire basketball tickets were compared for two groups of MBA students: those students with last names beginning with one of the first 9 letters of the alphabet and those with last names beginning with one of the last 9 letters of the alphabet. How many MBA students from each group would need to be selected to estimate the difference in mean times to within 2 minutes of its true value with 95% confidence? (Assume equal sample sizes will be selected for each group and that the response time standard deviation for both groups is .)
SOLAR 7.63 Solar energy generation along highways. Refer to the International Journal of Energy and Environmental Engineering (Dec. 2013) study of solar energy generation along highways, Exercise 7.45 (p. 397). Recall that the researchers compared the mean monthly amount of solar energy generated by east-west and north-south oriented solar panels using a matched-pairs experiment. However, a small sample of only five months was used for the analysis. How many more months would need to be selected in order to estimate the difference in means to within 25 kilowatt hours with a 90% confidence interval? Use the information provided in the SOLAR file to find an estimate of the standard error required to carry out the calculation.
7.64 Do video game players have superior visual attention skills? Refer to the Journal of Articles in Support of the Null Hypothesis (Vol. 6, 2009) study comparing the visual attention skill of video game and non-video game players, Exercise 7.20 (p. 384). Recall that there was no significant difference between the mean score on the attentional blink test of video game players and the corresponding mean for non–video game players. It is possible that selecting larger samples would yield a significant difference. How many video game and non–video game players would need to be selected in order to estimate the difference in mean score for the two groups to within 5 points with 95% confidence? (Assume equal sample sizes will be selected from the two groups and that the score standard deviation for both groups is .)
7.65 Scouting an NFL free agent. In seeking a free-agent NFL running back, a general manager is looking for a player with high mean yards gained per carry and a small standard deviation. Suppose the GM wishes to compare the mean yards gained per carry for two free agents, on the basis of independent random samples of their yards gained per carry. Data from last year’s pro football season indicate that yards. If the GM wants to estimate the difference in means correct to within 1 yard with a confidence level of .90, how many runs would have to be observed for each player? (Assume equal sample sizes.)