Chapter 10

Having Confidence in Your Results

In This Chapter

arrow Investigating the basics of confidence intervals

arrow Determining confidence intervals for a number of statistics

arrow Linking significance testing to confidence intervals

In Chapter 9, I show you how to express the precision of a numeric result using the standard error (SE) and how to calculate the SE (or have a computer calculate it for you) for the most common kinds of numerical results you get from biological studies — means, proportions, event rates, and regression coefficients. But the SE is only one way of specifying how precise your results are. In this chapter, I describe another commonly used indicator of precision — the confidence interval (CI).

remember.eps I assume that you’re familiar with the concepts of populations, samples, and statistical estimation theory (see Chapter 3 if you’re not) and that you know what standard errors are (see Chapter 9 if you don’t). Always keep in mind that when you conduct any kind of research study, such as a clinical trial, you’re studying a small sample of subjects (like 50 adult volunteers with diabetes) that you’ve randomly selected as representing a large, hypothetical population (all adults with diabetes). And any numeric quantity (called a sample statistic) that you observe in this sample is just an imperfect estimate of the corresponding population parameter — the true value of that quantity in the population.

Feeling Confident about Confidence Interval Basics

Before jumping into the main part of this chapter (how to calculate confidence intervals around the sample statistics you get from your experiments), it’s important to be comfortable with the basic concepts and terminology related to confidence intervals. This is an area where nuances of meaning can be tricky, and the right-sounding words can be used the wrong way.

Defining confidence intervals

remember.eps Informally, a confidence interval indicates a range of values that’s likely to encompass the truth. More formally, the CI around your sample statistic is calculated in such a way that it has a specified chance of surrounding (or “containing”) the value of the corresponding population parameter.

Unlike the SE, which is usually written as a ± number immediately following your measured value (for example, a blood glucose measurement of 120 ± 3 mg/dL), the CI is usually written as a pair of numbers separated by a dash, like this: 114–126. The two numbers that make up the lower and upper ends of the confidence interval are called the lower and upper confidence limits (CLs). Sometimes you see the abbreviations written with a subscript L or U, like this: CLL or CLU, indicating the lower and upper confidence limits, respectively.

remember.eps Although SEs and CIs are both used as indicators of the precision of a numerical quantity, they differ in their focus (sample or population):

check.png A standard error indicates how much your observed sample statistic may fluctuate if the same experiment is repeated a large number of times, so the SE focuses on the sample.

check.png A confidence interval indicates the range that’s likely to contain the true population parameter, so the CI focuses on the population.

remember.eps One important property of confidence intervals (and standard errors) is that they vary inversely with the square root of the sample size. For example, if you were to quadruple your sample size, it would cut the SE and the width of the CI in half. This “square root law” is one of the most widely applicable rules in all of statistics.

Looking at confidence levels

The probability that the confidence interval encompasses the true value is called the confidence level of the CI. You can calculate a CI for any confidence level you like, but the most commonly seen value is 95 percent. Whenever you report a confidence interval, you must state the confidence level, like this: 95% CI = 114–126.

In general, higher confidence levels correspond to wider confidence intervals, and lower confidence level intervals are narrower. For example, the range 118–122 may have a 50 percent chance of containing the true population parameter within it; 115–125 may have a 90 percent chance of containing the truth, and 112–128 may have a 99 percent chance.



warning_bomb.eps The confidence level is sometimes abbreviated CL, just like the confidence limit, which can be confusing. Fortunately, the distinction is usually clear from the context in which CL appears; when it’s not clear, I spell out what CL stands for.

Taking sides with confidence intervals

Properly calculated 95 percent confidence intervals contain the true value 95 percent of the time and fail to contain the true value the other 5 percent of the time. Usually, 95 percent confidence limits are calculated to be balanced so that the 5 percent failures are split evenly — the true value is less than the lower confidence limit 2.5 percent of the time and greater than the upper confidence limit 2.5 percent of the time. This is called a two-sided, balanced CI.

But the confidence limits don’t have to be balanced. Sometimes the consequences of overestimating a value may be more severe than underestimating it, or vice versa. You can calculate an unbalanced, two-sided, 95 percent confidence limit that splits the 5 percent exceptions so that the true value is smaller than the lower confidence limit 4 percent of the time, and larger than the upper confidence limit 1 percent of the time. Unbalanced confidence limits extend farther out from the estimated value on the side with the smaller percentage.

In some situations, like noninferiority studies (described in Chapter 16), you may want all the failures to be on one side; that is, you want a one-sided confidence limit. Actually, the other side goes out an infinite distance. For example, you can have an observed value of 120 with a one-sided confidence interval that goes from minus infinity to +125 or from 115 to plus infinity.

Calculating Confidence Intervals

Just as the SE formulas in Chapter 9 depend on what kind of sample statistic you’re dealing with (whether you’re measuring or counting something or getting it from a regression program or from some other calculation), confidence intervals (CIs) are calculated in different ways depending on how you obtain the sample statistic. In the following sections, I describe methods for the most common situations, using the same examples I use in Chapter 9 for calculating standard errors.

tip.eps You can use a "bootstrap" simulation method to calculate the SE of any quantity you can calculate from your data; you can use the same technique to generate CIs around those calculated quantities. You use your data in a special way (called "resampling") to simulate what might have happened if you had repeated your experiment many times over, each time calculating and recording the quantity you're interested in. The CI is simply the interval that encloses 95 percent of all these simulated values. I describe this method (with an example) in an online article at www.dummies.com/extras/biostatistics.

Before you begin: Formulas for confidence limits in large samples

Most of the approximate methods I describe in the following sections are based on the assumption that your observed value has a sampling distribution that’s (at least approximately) normally distributed. Fortunately, there are good theoretical and practical reasons to believe that almost every sample statistic you’re likely to encounter in practical work will have a nearly normal sampling distribution, for large enough samples.

remember.eps For any normally distributed sample statistic, the lower and upper confidence limits can be calculated very simply from the observed value (V) and standard error (SE) of the statistic:

CLL = Vk × SE

CLU = V + k × SE

Confidence limits computed this way are often referred to as normal-based, asymptotic, or central-limit-theorem (CLT) confidence limits. (The CLT, which I introduce in Chapter 9, provides good reason to believe that almost any sample statistic you're likely to encounter will be nearly normally distributed for large samples.) The value of k in the formulas depends on the desired confidence level and can be obtained from a table of critical values for the normal distribution or from a web page such as StatPages.info/pdfs.html. Table 10-1 lists the k values for some commonly used confidence levels.

Table 10-1 Multipliers for Normal-Based Confidence Intervals

Confidence Level

Tail Probability

k Value

50%

0.50

0.67

80%

0.20

1.28

90%

0.10

1.64

95%

0.05

1.96

98%

0.02

2.33

99%

0.01

2.58

tip.eps For the most commonly used confidence level, 95 percent, k is 1.96, or approximately 2. This leads to the very simple approximation that 95 percent confidence limits are about two standard errors above and below the observed value.

warning_bomb.eps The distance of each confidence limit from the measured value, k × SE, is called the margin of error (ME). Because MEs are almost always calculated at the 95 percent confidence level, they’re usually about twice as large as the corresponding CIs. MEs are most commonly used to express the precision of the results of a survey, such as “These poll results have a margin of error of ±5 percent.” This usage can lead to some confusion because the SE is also usually expressed as a ± number. For this reason, it’s probably best to use the CI instead of the ME to express precision when reporting clinical research results. In any event, be sure to state which one you’re using when you report your results.

The confidence interval around a mean

Suppose you study 25 adult diabetics (N = 25) and find that they have an average fasting blood glucose level of 130 mg/dL with a standard deviation (SD) of ±40 mg/dL. What is the 95 percent confidence interval around that 130 mg/dL estimated mean?

To calculate the confidence limits around a mean using the formulas in the preceding section, you first calculate the standard error of the mean (the SEM), which (from Chapter 9) is 9781118553992-eq10001.eps, where SD is the standard deviation of the N individual values. So for the glucose example, the SE of the mean is 9781118553992-eq10002.eps, which is equal to 40/5, or 8 mg/dL.

Using k = 1.95 for a 95 percent confidence level (from Table 10-1), the lower and upper confidence limits around the mean are

CLL = 130 – 1.96 × 8 = 114.3

CLU = 130 + 1.96 × 8 = 145.7

You report your result this way: mean glucose = 130 mg/dL, 95%CI = 114–146 mg/dL. (Don’t report numbers to more decimal places than their precision warrants. In this example, the digits after the decimal point are practically meaningless, so the numbers are rounded off.)

tip.eps A more accurate version of the formulas in the preceding section uses k values derived from a table of critical values of the Student t distribution. You need to know the number of degrees of freedom, which, for a mean value, is always equal to N – 1. Using a Student t table (see Chapter 25) or a web page like StatPages.info/pdfs.html, you can find that the Student-based k value for a 95 percent confidence level and 24 degrees of freedom is equal to 2.06, a little bit larger than the normal-based k value. Using this k value instead of 1.96, you can calculate the 95 percent confidence limits as 113.52 and 146.48, which happen to round off to the same whole numbers as the normal-based confidence limits. Generally you don't have to use the more-complicated Student-based k values unless N is quite small (say, less than 10).

technicalstuff.eps What if your original numbers (the ones being averaged) aren’t normally distributed? You shouldn’t just blindly apply the normal-based CI formulas for non-normally distributed data. If you know that your data is log-normally distributed (a very common type of non-normality), you can do the following:

1. Take the logarithm of every individual subject’s value.

2. Find the mean, SD, and SEM of these logarithms.

3. Use the normal-based formulas to get the confidence limits (CLs) around the mean of the logarithms.

4. Calculate the antilogarithm of the mean of the logs.

The result is the geometric mean of the original values. (See Chapter 8.)

5. Calculate the antilogarithm of the lower and upper CLs.

These are the lower and upper CLs around the geometric mean.

tip.eps If you don’t know what distribution your values have, you can use the bootstrapping approach described later in this chapter.

The confidence interval around a proportion

If you were to survey 100 typical children and find that 70 of them like chocolate, you’d estimate that 70 percent of children like chocolate. What is the 95 percent CI around that 70 percent estimate?

There are many approximate formulas for confidence intervals around an observed proportion (also called binomial confidence intervals). The simplest method is based on approximating the binomial distribution by a normal distribution (see Chapter 25). It should be used only when N (the denominator of the proportion) is large (at least 50), and the proportion is not too close to 0 or 1 (say, between 0.2 and 0.8). You first calculate the SE of the proportion as described in Chapter 9, 9781118553992-eq10003.eps, and then you use the normal-based formulas in the earlier section Before you begin: Formulas for confidence limits in large samples.

Using the numbers from the preceding example, you have p = 0.7 and N = 100,

so the SE for the proportion is 9781118553992-eq10004.eps, or 0.046. From Table 10-1, k is

1.96 for 95 percent confidence limits. So CLL = 0.7 – 1.96 × 0.046 and CLU = 0.7 + 1.96 × 0.046, which works out to a 95 percent CI of 0.61 to 0.79. To express these fractions as percentages, you report your result this way: “The percentage of children in the sample who liked chocolate was 70 percent, 95%CI = 61–79%.”

Many other approximate formulas for CIs around observed proportions exist, most of which are more reliable when N is small. There are also several exact methods, the first and most famous of which is called the Clopper-Pearson method, named after the authors of a classic 1934 article. The Clopper-Pearson calculations are too complicated to attempt by hand, but fortunately, many statistical packages can do them for you.

tip.eps You can also go to the Binomial Confidence Intervals section of the online web calculator at StatPages.info/confint.html. Enter the numerator (70) and denominator (100) of the fraction, and press the Compute button. The page calculates the observed proportion (0.7) and the exact confidence limits (0.600 and 0.788), which you can convert to percentages and express as 95%CI = 60–79%. For this example, the normal-based approximate CI (61–79%) is very close to the exact CI, mainly because the sample size was quite large. For small samples, you should report exact confidence limits.

The confidence interval around an event count or rate

Suppose that there were 36 fatal highway accidents in your county in the last three months. If that’s the only safety data you have to go on, then your best estimate of the monthly fatal accident rate is simply the observed count (N), divided by the length of time (T) during which the N counts were observed: 36/3, or 12.0 fatal accidents per month. What is the 95 percent CI around that estimate?

There are many approximate formulas for the CIs around an observed event count or rate (also called a Poisson CI). The simplest method is based on approximating the Poisson distribution by a normal distribution (see Chapter 25). It should be used only when N is large (at least 50). You first calculate the SE of the event rate as described in Chapter 9, 9781118553992-eq10005.eps; then you use the normal-based formulas in the earlier section Before you begin: Formulas for confidence limits in large samples.

Using the numbers from the fatal-accident example, N = 36 and T=3, so the SE for the proportion is 9781118553992-eq10006.eps, or 1.67. According to Table 10-1, k is 1.96 for 95 percent CLs. So CLL = 12.0 – 1.96 × 1.67 and CLU = 12.0 + 1.96 × 1.67, which works out to 95 percent confidence limits of 8.73 and 15.27. You report your result this way: “The fatal accident rate was 12.0, 95%CI = 8.7–15.3 fatal accidents per month.”

To calculate the CI around the event count itself, you estimate the SE of the count N as 9781118553992-eq10007.eps, then calculate the CI around the observed count using the formulas in the earlier section Before you begin: Formulas for confidence limits in large samples. So the SE of the 36 observed fatal accidents in a three-month period is simply 9781118553992-eq10008.eps, which equals 6.0. So CLL = 36.0 – 1.96 × 6.0 and CLH = 36.0 + 1.96 × 6.0, which works out to a 95 percent CI of 24.2 to 47.8 accidents in a three-month period.

Many other approximate formulas for CIs around observed event counts and rates are available, most of which are more reliable when N is small. There are also several exact methods. They’re too complicated to attempt by hand, involving evaluating the Poisson distribution repeatedly to find values for the true mean event count that are consistent with (that is, not significantly different from) the count you actually observed. Fortunately, many statistical packages can do these calculations for you.

tip.eps You can also go to the Poisson Confidence Intervals section of the online web calculator at StatPages.info/confint.html. Enter the observed count (36) and press the Compute button. The page calculates the exact 95 percent CI (25.2–49.8). For this example, the normal-based CI (24.2–47.8) is only a rough approximation to the exact CI, mainly because the event count was only 36 accidents. For small samples, you should report exact confidence limits.

The confidence interval around a regression coefficient

Suppose you’re interested in whether or not blood urea nitrogen (BUN), a measure of kidney performance, tends to increase after age 60 in healthy adults. You can enroll a bunch of generally healthy adults age 60 and above, record their ages, and measure their BUN. Then you can create a scatter plot of BUN versus age and fit a straight line to the data points (see Chapter 18). The slope of this line would have units of (mg/dL)/year and would tell you how much, on average, a healthy person’s BUN goes up with every additional year of age after age 60. Suppose the answer you get is that glucose increases 1.4 mg/dL per year. What is the 95 percent CI around that estimate of yearly increase?

This is one time you don’t need any formulas. Any good regression program (like the ones described in Chapter 4) can provide the SE for every parameter it fits to your data. (Chapter 18 describes where to find the SE for the slope of a straight line.) The regression program may also provide the confidence limits for any confidence level you specify, but if it doesn’t, you can easily calculate the confidence limits using the formulas in the earlier section Before you begin: Formulas for confidence limits in large samples.

Relating Confidence Intervals and Significance Testing

You can use confidence intervals (CIs) as an alternative to some of the usual significance tests (see Chapter 3 for an introduction to the concepts and terminology of significance testing and Chapters 1215 for descriptions of specific significance tests). To assess significance using CIs, you first define a number that measures the amount of effect you’re testing for. This effect size can be the difference between two means or two proportions, the ratio of two means, an odds ratio, a relative risk ratio, or a hazard ratio, among others. The complete absence of any effect corresponds to a difference of 0, or a ratio of 1, so I call these the “no-effect” values.

remember.eps The following are always true:

check.png If the 95 percent CI around the observed effect size includes the no-effect value (0 for differences, 1 for ratios), then the effect is not statistically significant (that is, a significance test for that effect will produce p > 0.05).

check.png If the 95 percent CI around the observed effect size does not include the no-effect value, then the effect is significant (that is, a significance test for that effect will produce p 0.05).

The same kind of correspondence is true for other confidence levels and significance levels: 90 percent confidence levels correspond to the p = 0.10 significance level, 99 percent confidence levels correspond to the p = 0.01 significance level, and so on.

So you have two different, but related, ways to prove that some effect is ­present — you can use significance tests, and you can use confidence intervals. Which one is better? The two methods are consistent with each other, but many people prefer the CI approach to the p-value approach. Why?

check.png The p value is the result of the complex interplay between the observed effect size, the sample size, and the size of random fluctuations, all boiled down into a single number that doesn’t tell you whether the effect was large or small, clinically important or negligible.

check.png The CI around the mean effect clearly shows you the observed effect size, along with an indicator of how uncertain your knowledge of that effect size is. It tells you not only whether the effect is statistically significant, but also can give you an intuitive sense of whether the effect is clinically important.

The CI approach lends itself to a very simple and natural way of comparing two products for equivalence or noninferiority, as I explain in Chapter 16.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset