Chapter 10
Having Confidence in Your Results
In This Chapter
Investigating the basics of confidence intervals
Determining confidence intervals for a number of statistics
Linking significance testing to confidence intervals
In Chapter 9, I show you how to express the precision of a numeric result using the standard error (SE) and how to calculate the SE (or have a computer calculate it for you) for the most common kinds of numerical results you get from biological studies — means, proportions, event rates, and regression coefficients. But the SE is only one way of specifying how precise your results are. In this chapter, I describe another commonly used indicator of precision — the confidence interval (CI).
Feeling Confident about Confidence Interval Basics
Before jumping into the main part of this chapter (how to calculate confidence intervals around the sample statistics you get from your experiments), it’s important to be comfortable with the basic concepts and terminology related to confidence intervals. This is an area where nuances of meaning can be tricky, and the right-sounding words can be used the wrong way.
Defining confidence intervals
Unlike the SE, which is usually written as a ± number immediately following your measured value (for example, a blood glucose measurement of 120 ± 3 mg/dL), the CI is usually written as a pair of numbers separated by a dash, like this: 114–126. The two numbers that make up the lower and upper ends of the confidence interval are called the lower and upper confidence limits (CLs). Sometimes you see the abbreviations written with a subscript L or U, like this: CLL or CLU, indicating the lower and upper confidence limits, respectively.
A standard error indicates how much your observed sample statistic may fluctuate if the same experiment is repeated a large number of times, so the SE focuses on the sample.
A confidence interval indicates the range that’s likely to contain the true population parameter, so the CI focuses on the population.
Looking at confidence levels
The probability that the confidence interval encompasses the true value is called the confidence level of the CI. You can calculate a CI for any confidence level you like, but the most commonly seen value is 95 percent. Whenever you report a confidence interval, you must state the confidence level, like this: 95% CI = 114–126.
In general, higher confidence levels correspond to wider confidence intervals, and lower confidence level intervals are narrower. For example, the range 118–122 may have a 50 percent chance of containing the true population parameter within it; 115–125 may have a 90 percent chance of containing the truth, and 112–128 may have a 99 percent chance.
Taking sides with confidence intervals
Properly calculated 95 percent confidence intervals contain the true value 95 percent of the time and fail to contain the true value the other 5 percent of the time. Usually, 95 percent confidence limits are calculated to be balanced so that the 5 percent failures are split evenly — the true value is less than the lower confidence limit 2.5 percent of the time and greater than the upper confidence limit 2.5 percent of the time. This is called a two-sided, balanced CI.
But the confidence limits don’t have to be balanced. Sometimes the consequences of overestimating a value may be more severe than underestimating it, or vice versa. You can calculate an unbalanced, two-sided, 95 percent confidence limit that splits the 5 percent exceptions so that the true value is smaller than the lower confidence limit 4 percent of the time, and larger than the upper confidence limit 1 percent of the time. Unbalanced confidence limits extend farther out from the estimated value on the side with the smaller percentage.
In some situations, like noninferiority studies (described in Chapter 16), you may want all the failures to be on one side; that is, you want a one-sided confidence limit. Actually, the other side goes out an infinite distance. For example, you can have an observed value of 120 with a one-sided confidence interval that goes from minus infinity to +125 or from 115 to plus infinity.
Calculating Confidence Intervals
Just as the SE formulas in Chapter 9 depend on what kind of sample statistic you’re dealing with (whether you’re measuring or counting something or getting it from a regression program or from some other calculation), confidence intervals (CIs) are calculated in different ways depending on how you obtain the sample statistic. In the following sections, I describe methods for the most common situations, using the same examples I use in Chapter 9 for calculating standard errors.
Before you begin: Formulas for confidence limits in large samples
Most of the approximate methods I describe in the following sections are based on the assumption that your observed value has a sampling distribution that’s (at least approximately) normally distributed. Fortunately, there are good theoretical and practical reasons to believe that almost every sample statistic you’re likely to encounter in practical work will have a nearly normal sampling distribution, for large enough samples.
CLL = V – k × SE
CLU = V + k × SE
Confidence limits computed this way are often referred to as normal-based, asymptotic, or central-limit-theorem (CLT) confidence limits. (The CLT, which I introduce in Chapter 9, provides good reason to believe that almost any sample statistic you're likely to encounter will be nearly normally distributed for large samples.) The value of k in the formulas depends on the desired confidence level and can be obtained from a table of critical values for the normal distribution or from a web page such as StatPages.info/pdfs.html
. Table 10-1 lists the k values for some commonly used confidence levels.
Table 10-1 Multipliers for Normal-Based Confidence Intervals
Confidence Level |
Tail Probability |
k Value |
50% |
0.50 |
0.67 |
80% |
0.20 |
1.28 |
90% |
0.10 |
1.64 |
95% |
0.05 |
1.96 |
98% |
0.02 |
2.33 |
99% |
0.01 |
2.58 |
The confidence interval around a mean
Suppose you study 25 adult diabetics (N = 25) and find that they have an average fasting blood glucose level of 130 mg/dL with a standard deviation (SD) of ±40 mg/dL. What is the 95 percent confidence interval around that 130 mg/dL estimated mean?
To calculate the confidence limits around a mean using the formulas in the preceding section, you first calculate the standard error of the mean (the SEM), which (from Chapter 9) is , where SD is the standard deviation of the N individual values. So for the glucose example, the SE of the mean is , which is equal to 40/5, or 8 mg/dL.
Using k = 1.95 for a 95 percent confidence level (from Table 10-1), the lower and upper confidence limits around the mean are
CLL = 130 – 1.96 × 8 = 114.3
CLU = 130 + 1.96 × 8 = 145.7
You report your result this way: mean glucose = 130 mg/dL, 95%CI = 114–146 mg/dL. (Don’t report numbers to more decimal places than their precision warrants. In this example, the digits after the decimal point are practically meaningless, so the numbers are rounded off.)
1. Take the logarithm of every individual subject’s value.
2. Find the mean, SD, and SEM of these logarithms.
3. Use the normal-based formulas to get the confidence limits (CLs) around the mean of the logarithms.
4. Calculate the antilogarithm of the mean of the logs.
The result is the geometric mean of the original values. (See Chapter 8.)
5. Calculate the antilogarithm of the lower and upper CLs.
These are the lower and upper CLs around the geometric mean.
The confidence interval around a proportion
If you were to survey 100 typical children and find that 70 of them like chocolate, you’d estimate that 70 percent of children like chocolate. What is the 95 percent CI around that 70 percent estimate?
There are many approximate formulas for confidence intervals around an observed proportion (also called binomial confidence intervals). The simplest method is based on approximating the binomial distribution by a normal distribution (see Chapter 25). It should be used only when N (the denominator of the proportion) is large (at least 50), and the proportion is not too close to 0 or 1 (say, between 0.2 and 0.8). You first calculate the SE of the proportion as described in Chapter 9, , and then you use the normal-based formulas in the earlier section Before you begin: Formulas for confidence limits in large samples.
Using the numbers from the preceding example, you have p = 0.7 and N = 100,
so the SE for the proportion is , or 0.046. From Table 10-1, k is
1.96 for 95 percent confidence limits. So CLL = 0.7 – 1.96 × 0.046 and CLU = 0.7 + 1.96 × 0.046, which works out to a 95 percent CI of 0.61 to 0.79. To express these fractions as percentages, you report your result this way: “The percentage of children in the sample who liked chocolate was 70 percent, 95%CI = 61–79%.”
Many other approximate formulas for CIs around observed proportions exist, most of which are more reliable when N is small. There are also several exact methods, the first and most famous of which is called the Clopper-Pearson method, named after the authors of a classic 1934 article. The Clopper-Pearson calculations are too complicated to attempt by hand, but fortunately, many statistical packages can do them for you.
The confidence interval around an event count or rate
Suppose that there were 36 fatal highway accidents in your county in the last three months. If that’s the only safety data you have to go on, then your best estimate of the monthly fatal accident rate is simply the observed count (N), divided by the length of time (T) during which the N counts were observed: 36/3, or 12.0 fatal accidents per month. What is the 95 percent CI around that estimate?
There are many approximate formulas for the CIs around an observed event count or rate (also called a Poisson CI). The simplest method is based on approximating the Poisson distribution by a normal distribution (see Chapter 25). It should be used only when N is large (at least 50). You first calculate the SE of the event rate as described in Chapter 9, ; then you use the normal-based formulas in the earlier section Before you begin: Formulas for confidence limits in large samples.
Using the numbers from the fatal-accident example, N = 36 and T=3, so the SE for the proportion is , or 1.67. According to Table 10-1, k is 1.96 for 95 percent CLs. So CLL = 12.0 – 1.96 × 1.67 and CLU = 12.0 + 1.96 × 1.67, which works out to 95 percent confidence limits of 8.73 and 15.27. You report your result this way: “The fatal accident rate was 12.0, 95%CI = 8.7–15.3 fatal accidents per month.”
To calculate the CI around the event count itself, you estimate the SE of the count N as , then calculate the CI around the observed count using the formulas in the earlier section Before you begin: Formulas for confidence limits in large samples. So the SE of the 36 observed fatal accidents in a three-month period is simply , which equals 6.0. So CLL = 36.0 – 1.96 × 6.0 and CLH = 36.0 + 1.96 × 6.0, which works out to a 95 percent CI of 24.2 to 47.8 accidents in a three-month period.
Many other approximate formulas for CIs around observed event counts and rates are available, most of which are more reliable when N is small. There are also several exact methods. They’re too complicated to attempt by hand, involving evaluating the Poisson distribution repeatedly to find values for the true mean event count that are consistent with (that is, not significantly different from) the count you actually observed. Fortunately, many statistical packages can do these calculations for you.
The confidence interval around a regression coefficient
Suppose you’re interested in whether or not blood urea nitrogen (BUN), a measure of kidney performance, tends to increase after age 60 in healthy adults. You can enroll a bunch of generally healthy adults age 60 and above, record their ages, and measure their BUN. Then you can create a scatter plot of BUN versus age and fit a straight line to the data points (see Chapter 18). The slope of this line would have units of (mg/dL)/year and would tell you how much, on average, a healthy person’s BUN goes up with every additional year of age after age 60. Suppose the answer you get is that glucose increases 1.4 mg/dL per year. What is the 95 percent CI around that estimate of yearly increase?
This is one time you don’t need any formulas. Any good regression program (like the ones described in Chapter 4) can provide the SE for every parameter it fits to your data. (Chapter 18 describes where to find the SE for the slope of a straight line.) The regression program may also provide the confidence limits for any confidence level you specify, but if it doesn’t, you can easily calculate the confidence limits using the formulas in the earlier section Before you begin: Formulas for confidence limits in large samples.
Relating Confidence Intervals and Significance Testing
You can use confidence intervals (CIs) as an alternative to some of the usual significance tests (see Chapter 3 for an introduction to the concepts and terminology of significance testing and Chapters 12–15 for descriptions of specific significance tests). To assess significance using CIs, you first define a number that measures the amount of effect you’re testing for. This effect size can be the difference between two means or two proportions, the ratio of two means, an odds ratio, a relative risk ratio, or a hazard ratio, among others. The complete absence of any effect corresponds to a difference of 0, or a ratio of 1, so I call these the “no-effect” values.
If the 95 percent CI around the observed effect size includes the no-effect value (0 for differences, 1 for ratios), then the effect is not statistically significant (that is, a significance test for that effect will produce p > 0.05).
If the 95 percent CI around the observed effect size does not include the no-effect value, then the effect is significant (that is, a significance test for that effect will produce p ≤ 0.05).
The same kind of correspondence is true for other confidence levels and significance levels: 90 percent confidence levels correspond to the p = 0.10 significance level, 99 percent confidence levels correspond to the p = 0.01 significance level, and so on.
So you have two different, but related, ways to prove that some effect is present — you can use significance tests, and you can use confidence intervals. Which one is better? The two methods are consistent with each other, but many people prefer the CI approach to the p-value approach. Why?
The p value is the result of the complex interplay between the observed effect size, the sample size, and the size of random fluctuations, all boiled down into a single number that doesn’t tell you whether the effect was large or small, clinically important or negligible.
The CI around the mean effect clearly shows you the observed effect size, along with an indicator of how uncertain your knowledge of that effect size is. It tells you not only whether the effect is statistically significant, but also can give you an intuitive sense of whether the effect is clinically important.
The CI approach lends itself to a very simple and natural way of comparing two products for equivalence or noninferiority, as I explain in Chapter 16.