In the previous sections, we considered interval estimation for population means or proportions. In this optional section, we discuss a confidence interval for a population variance, .
Recall Example 1.4 (p. 10) and the Consider a U.S. Army Corps of Engineers study of contaminated fish in the Tennessee River, Alabama. It is important for the Corps of Engineers to know how stable the weights of the contaminated fish are. That is, how large is the variation in the fish weights? The keyword “variation” indicates that the target population parameter is , the variance of the weights of all contaminated fish inhabiting the Tennessee River. Of course, the exact value of will be unknown. Consequently, the Corps of Engineers wants to estimate its value with a high level of confidence.
Intuitively, it seems reasonable to use the sample variance, , to estimate . However, unlike with sample means and proportions, the sampling distribution of does not follow a normal (z) distribution or a Student’s t distribution. Rather, when certain assumptions are satisfied (we discuss these later), the sampling distribution of possesses approximately a chi-square distribution. The chi-square probability distribution, like the t distribution, is characterized by a quantity called the degrees of freedom (df) associated with the distribution. Several chi-square distributions with different df values are shown in Figure 5.20. You can see that unlike z and t distributions, the chi-square distribution is not symmetric about 0.
Degrees of Freedom | |||||
---|---|---|---|---|---|
1 | 2.70554 | 3.84146 | 5.02389 | 6.63490 | 7.87944 |
2 | 4.60517 | 5.99147 | 7.37776 | 9.21034 | 10.5966 |
3 | 6.25139 | 7.81473 | 9.34840 | 11.3449 | 12.8381 |
4 | 7.77944 | 9.48773 | 11.1433 | 13.2767 | 14.8602 |
5 | 9.23635 | 11.0705 | 12.8325 | 15.0863 | 16.7496 |
6 | 10.6446 | 12.5916 | 14.4494 | 16.8119 | 18.5476 |
7 | 12.0170 | 14.0671 | 16.0128 | 18.4753 | 20.2777 |
8 | 13.3616 | 15.5073 | 17.5346 | 20.0902 | 21.9550 |
9 | 14.6837 | 16.9190 | 19.0228 | 21.6660 | 23.5893 |
10 | 15.9871 | 18.3070 | 20.4831 | 23.2093 | 25.1882 |
11 | 17.2750 | 19.6751 | 21.9200 | 24.7250 | 25.7569 |
12 | 18.5494 | 21.0261 | 23.3367 | 26.2170 | 28.2995 |
13 | 19.8119 | 22.3621 | 24.7356 | 27.6883 | 29.8194 |
14 | 21.0642 | 23.6848 | 26.1190 | 29.1413 | 31.3193 |
15 | 22.3072 | 24.9958 | 27.4884 | 30.5779 | 32.8013 |
16 | 23.5418 | 26.2862 | 28.8454 | 31.9999 | 34.2672 |
17 | 24.7690 | 27.5871 | 30.1910 | 33.4087 | 35.7185 |
18 | 25.9894 | 28.8693 | 31.5264 | 34.8053 | 37.1564 |
19 | 27.2036 | 30.1435 | 32.8523 | 36.1908 | 38.5822 |
The upper-tail areas for this distribution have been tabulated and are given in Table IV in Appendix B, a portion of which is reproduced in Table 5.7. The table gives the values of , denoted as , that locate an area of in the upper tail of the chi-square distribution; that is, . As with the t-statistic, the degrees of freedom associated with . Thus, for and an upper-tail value , you will have df and (highlighted in Table 5.7).
[Note: Values of can also be obtained using the inverse chi-square option of statistical software.]
The chi-square distribution is used to find a confidence interval for , as shown in the box. An illustrative example follows.
where and are values corresponding to an area of in the right (upper) and left (lower) tails, respectively, of the chi-square distribution based on degrees of freedom.
A random sample is selected from the target population.
The population of interest has a relative frequency distribution that is approximately normal.
Refer to the U.S. Army Corps of Engineers study of contaminated fish in the Tennessee River. The Corps of Engineers has collected data for a random sample of 144 fish contaminated with DDT. (The engineers made sure to capture contaminated fish in several different randomly selected streams and tributaries of the river.) The fish weights (in grams) are saved in the FISHDDT file. The Army Corps of Engineers wants to estimate the true variation in fish weights in order to determine whether the fish are stable enough to allow further testing for DDT contamination.
Use the sample data to find a 95% confidence interval for the parameter of interest.
Determine whether the confidence interval, part a, is valid.
Here the target parameter is , the variance of the population of weights of contaminated fish. First, we need to find the sample variance, s2, to compute the interval estimate. The MINITAB printout, Figure 5.21 gives descriptive statistics for the sample weights. You can see that the variance (highlighted) is .
For (a 95% confidence interval), we require the critical values and for a chi-square distribution with degrees of freedom. Examining Table IV in Appendix B, we see that these values are given for and , but not for . We could approximate these critical chi-square values using the entries for (the row closest to ). Or we could use statistical software to obtain the exact values. The exact values, obtained using MINITAB (and shown on Figure 5.22), are and .
Substituting the appropriate values into the formula given in the box, we obtain:
Or
Thus, the Army Corps of Engineers can be 95% confident that the variance in weights of the population of contaminated fish ranges between 113,907 and 181,371. [Note: You can obtain this interval directly using statistical software as well. This interval is shown (highlighted) on the MINITAB printout, Figure 5.23. Our calculated values agree, except for rounding.]
According to the box, two conditions are required for the confidence interval to be valid. First, the sample must be randomly selected from the population. The Army Corps of Engineers did, indeed, collect a random sample of contaminated fish, making sure to sample fish from different locations in the Tennessee River. Second, the population data (the fish weights) must be approximately normally distributed. A MINITAB histogram for the sampled fish weights (with a normal curve superimposed) is displayed in Figure 5.24. Clearly, the data appear to be approximately normally distributed. Thus, the confidence interval is valid.
Will this confidence interval be practically useful in helping the Corps of Engineers decide whether the weights of the fish are stable? Only if it is clear what a weight variance of, say, 150,000 grams2 implies. Most likely, the Corps of Engineers will want the interval in the same units as the weight measurement—grams. Consequently, a confidence interval for , the standard deviation of the population of fish weights, is desired. We demonstrate how to obtain this interval estimate in the next example.
Now Work Exercise 5.107a
Refer to Example 5.11. Find a 95% confidence interval for , the true standard deviation of the contaminated fish weights.
A confidence interval for is obtained by taking the square roots of the lower and upper endpoints of a confidence interval for . Consequently, the 95% confidence interval for is:
Or,
[Note: This interval is also shown on the MINITAB printout, Figure 5.23.]
Thus, the engineers can be 95% confident that the true standard deviation of fish weights is between 337.5 grams and 425.9 grams.
Suppose the Corps of Engineers’ threshold is grams. That is, if the standard deviation in fish weights is 500 grams or higher, further DDT contamination tests will be suspended due to the unstableness of the fish weights. Since the 95% confidence interval for lies below 500 grams, the engineers will continue the DDT contamination tests on the fish.
Now Work Exercise 5.107b
The procedure for estimating either or requires an assumption regardless of whether the sample size n is large or small (see the conditions in the box). The sampled data must come from a population that has an approximate normal distribution. Unlike small sample confidence intervals for based on the t-distribution, slight to moderate departures from normality will render the chi-square confidence interval for invalid.
5.99 What sampling distribution is used to find an interval estimate for ?
5.100 What conditions are required for a valid confidence interval for ?
5.101 How many degrees of freedom are associated with a chi-square sampling distribution for a sample of size n?
5.102 For each of the following combinations of and degrees of freedom (df), use either Table IV in Appendix B or statistical software to find the values of that would be used to form a confidence interval for .
5.103 Given the following values of , s, and n, form a 90% confidence interval for .
5.104 Refer to Exercise 5.103. For each part, a–d, form a 90% confidence interval for .
L05105 5.105 A random sample of observations from a normal distribution resulted in the data shown in the table. Compute a 95% confidence interval for .
8 | 2 | 3 | 7 | 11 | 6 |
SUSTAIN 5.106 Corporate sustainability of CPA firms. Refer to the Business and Society (Mar. 2011) study on the sustainability behaviors of CPA corporations, Exercise 5.18 (p. 262). Recall that the level of support for corporate sustainability (measured on a quantitative scale ranging from 0 to 160 points) was obtained for each in a sample of 992 senior managers at CPA firms. The accompanying MINITAB printout gives 90% confidence intervals for both the variance and standard deviation of level of support for all senior managers at CPA firms.
Locate the 90% confidence interval for on the printout.
Use the sample variance on the printout to calculate the 90% confidence interval for . Does your result agree with the interval shown on the printout?
Locate the 90% confidence interval for on the printout.
Use the result, part a, to calculate the 90% confidence interval for . Does your result agree with the interval shown on the printout?
Give a practical interpretation of the 90% confidence interval for .
What assumption about the distribution of level of support is required for the inference, part e, to be valid? Is this assumption reasonably satisfied?
ROCKS 5.107 Characteristics of a rockfall. Refer toConsider the Environmental Geology (Vol. 58, 2009) simulation study of how far a block from a collapsing rock wall will bounce down a soil slope, Exercise 2.61 (p. 61). Rebound lengths (in meters) were estimated for 13 rock bounces. The data are repeated in the table. A MINITAB analysis of the data is shown in the printout below.
10.94 | 13.71 | 11.38 | 7.26 | 17.83 | 11.92 | 11.87 |
5.44 | 13.35 | 4.90 | 5.85 | 5.10 | 6.77 |
Based on Paronuzzi, P. “Rockfall-induced block propagation on a soil slope, northern Italy.” Environmental Geology, Vol. 58, 2009 (Table 2).
a. Locate a 95% confidence interval for on the printout. Interpret the result.
b. Locate a 95% confidence interval for on the printout. Interpret the result.
c. What conditions are required for the intervals, parts a and b, to be valid?
5.108 Motivation of drug dealers. Refer to the Applied Psychology in Criminal Justice (Sept. 2009) study of the personality characteristics of convicted drug dealers, Exercise 5.17 (p. 262). A random sample of 100 drug dealers had a mean Wanting Recognition (WR) score of 39 points, with a standard deviation of 6 points. The researchers also are interested in , the variation in WR scores for all convicted drug dealers.
Identify the target parameter, in symbols and words.
Compute a 99% confidence interval for .
What does it mean to say that the target parameter lies within the interval with “99% confidence”?
What assumption about the data must be satisfied in order for the confidence interval to be valid?
To obtain a practical interpretation of the interval, part b, explain why a confidence interval for the standard deviation, , is desired.
Use the results, part b, to compute a 99% confidence interval for . Give a practical interpretation of the interval.
5.109 Facial structure of CEOs. Refer to the Psychological Science (Vol. 22, 2011) study of a chief executive officer’s facial structure, Exercise 5.21 (p. 263). Recall that the facial width-to-height ratio (WHR) was determined by computer analysis for each in a sample of 55 CEOs at publicly traded Fortune 500 firms, with the following results: .
Find and interpret a 95% confidence interval for the standard deviation, , of the facial WHR values for all CEOs at publicly traded Fortune 500 firms. Interpret the result.
For the interval, part a, to be valid, the population of WHR values should be distributed how? Draw a sketch of the required distribution to support your answer.
5.110 Antigens for a parasitic roundworm in birds. Refer to the Gene Therapy and Molecular Biology (June 2009) study of DNA in peptide (protein) produced by antigens for a parasitic roundworm in birds, Exercise 5.50 (p. 274). Recall that scientists tested each in a sample of 4 alleles of antigen-produced protein for level of peptide. The results were: . Use this information to construct a 90% confidence interval for the true variation in peptide scores for alleles of the antigen-produced protein. Interpret the interval for the scientists.
5.111 Oil content of fried sweet potato chips. The characteristics of sweet potato chips fried at different temperatures were investigated in the Journal of Food Engineering (Sept. 2013). A sample of 6 sweet potato slices were fried at 130° using a vacuum fryer. One characteristic of interest to the researchers was internal oil content (measured in gigagrams). The results were: and Use this information to construct a 95% confidence interval for the true standard deviation of the internal oil content distribution for the sweet potato chips. Interpret the result practically.
5.112 Radon exposure in Egyptian tombs. Refer to the Radiation Protection Dosimetry (Dec. 2010) study of radon exposure in tombs carved from limestone in the Egyptian Valley of Kings, Exercise 5.39 (p. 272). The radon levels in the inner chambers of a sample of 12 tombs were determined, yielding the following summary statistics: and Use this information to estimate, with 95% confidence, the true standard deviation of radon levels in tombs in the Valley of Kings. Interpret the resulting interval. Be sure to give the units of measurement in your interpretation.
MOLARS 5.113 Cheek teeth of extinct primates. Refer toConsider the American Journal of Physical Anthropology (Vol. 142, 2010) study of the characteristics of cheek teeth (e.g., molars) in an extinct primate species, Exercise 2.38 (p. 50). Recall that the The researchers recorded the dentary depth of molars (in millimeters) for a sample of 18 cheek teeth extracted from skulls. The data are repeated in the table. Estimate the true standard deviation in molar depths for the population of cheek teeth in extinct primates using a 95% confidence interval. Give a practical interpretation of the result. Are the conditions required for a valid confidence interval reasonably satisfied?
18.12 | 16.55 |
19.48 | 15.70 |
19.36 | 17.83 |
15.94 | 13.25 |
15.83 | 16.12 |
19.70 | 18.13 |
15.76 | 14.02 |
17.00 | 14.04 |
13.96 | 16.20 |
Based on Boyer, D. M., Evans, A. R., and Jernvall, J. “Evidence of dietary differentiation among Late Paleocene–Early Eocene Plesiadapids (Mammalia, Primates).” American Journal of Physical Anthropology, Vol. 142, 2010 (Table A3).
TURTLES 5.114 Shell lengths of sea turtles. Refer to the Aquatic Biology (Vol. 9, 2010) study of green sea turtles inhabiting the Grand Cayman South Sound lagoon, Exercise 5.24 (p. 264). Recall that the data on shell length, measured in centimeters, for 76 captured turtles are saved in the TURTLES file. Use the sample data to estimate the true variance in shell lengths of all green sea turtles in the lagoon with 90% confidence. Interpret the result.
TRAPS 5.115 Lobster trap placement. Refer to the Bulletin of Marine Science (Apr. 2010) study of red spiny lobster trap placement, Exercise 5.41 (p. 273). Trap spacing measurements (in meters) for a sample of seven teams of red spiny lobster fishermen are repeated in the table below. The researchers want to know how variable the trap spacing measurements are for the population of red spiny lobster fishermen fishing in Baja California Sur, Mexico. Provide the researchers with an estimate of the target parameter using a 99% confidence interval.
93 | 99 | 105 | 94 | 82 | 70 | 86 |
Based on Shester, G. G. “Explaining catch variation among Baja California lobster fishers through spatial analysis of trap-placement decisions.” Bulletin of Marine Science, Vol. 86, No. 2, Apr. 2010 (Table 1), pp. 479–498.
COUGH 5.116 Is honey a cough remedy? Refer to the Archives of Pediatrics and Adolescent Medicine (Dec. 2007) study of honey as a remedy for coughing, Exercise 2.40 (p. 51). Recall that the 105 ill children in the sample were randomly divided into groups. One group received a dosage of an over-the-counter cough medicine (DM); another group received a dosage of honey (H). The coughing improvement scores (as determined by the children’s parents) for the patients in the two groups are reproduced in the table on p. 295. The pediatric researchers desire information on the variation in coughing improvement scores for each of the two groups.
Find a 90% confidence interval for the standard deviation in improvement scores for the honey dosage group.
Repeat part a for the DM dosage group.
Based on the results, parts a and b, what conclusions can the pediatric researchers draw about which group has the smaller variation in improvement scores? (We demonstrate a more statistically valid method for comparing variances in Chapter 9.)
Honey | 12 | 11 | 15 | 11 | 10 | 13 | 10 | 4 | 15 | 16 | 9 | 14 | 10 | 6 |
Dosage: | 10 | 8 | 11 | 12 | 12 | 8 | 12 | 9 | 11 | 15 | 10 | 15 | 9 | 13 |
8 | 12 | 10 | 8 | 9 | 5 | 12 | ||||||||
DM | 4 | 6 | 9 | 4 | 7 | 7 | 7 | 9 | 12 | 10 | 11 | 6 | 3 | 4 |
Dosage: | 9 | 12 | 7 | 6 | 8 | 12 | 12 | 4 | 12 | 13 | 7 | 10 | 13 | 9 |
4 | 4 | 10 | 15 | 9 |
Based on Paul, I. M., et al. “Effect of honey, dextromethorphan, and no treatment on nocturnal cough and sleep quality for coughing children and their parents.” Archives of Pediatrics and Adolescent Medicine, Vol. 161, No. 12, Dec. 2007 (data simulated).
PHISH 5.117 Phishing attacks to e-mail accounts.Consider Refer to the Chance (Summer 2007) study of an actual phishing attack against an organization, Exercise 4.164 (p. 236). Recall that phishing describes an attempt to extract personal/financial information from unsuspecting people through fraudulent e-mail. The interarrival times (in seconds) for 267 fraud box e-mail notifications are saved in the accompanying file. Like with Exercise 4.164, considerConsider these interarrival times to represent the population of interest.
Obtain a random sample of interarrival times from the population.
Use the sample, part a, to obtain an interval estimate of the population variance of the interarrival times. What is the measure of reliability for your estimate?
Find the true population variance for the data. Does the interval, part b, contain the true variance? Give one reason why it may not.