Chapter 15

Analyzing Incidence and Prevalence Rates in Epidemiologic Data

In This Chapter

arrow Determining and expressing how prevalent a condition is

arrow Calculating incidence rates, rate ratios, and their standard errors

arrow Comparing incidence rates between two populations

arrow Estimating sample size needed to compare incidence rates

Epidemiology studies the patterns, causes, and effects of health and diseases in defined populations (sometimes very large populations, like entire cities or countries, or even the whole world). This chapter describes two concepts, prevalence and incidence, that are central to epidemiology and are frequently encountered in other areas of biological research as well. I describe how to calculate incidence rates and prevalence proportions; then, concentrating on the analysis of incidence (because prevalence can be analyzed using methods described elsewhere in this book), I describe how to calculate confidence intervals around incidence rates and rate ratios, and how to compare incidence rates between two populations.

Understanding Incidence and Prevalence

Incidence and prevalence are two related but distinct concepts. In the following sections, I define each of these concepts and provide examples; then I describe the relationship between incidence and prevalence.

Prevalence: The fraction of a population with a particular condition

remember.eps The prevalence of a condition in a population is the proportion of the population that has that condition at any given moment. It’s calculated by dividing the number of people in a defined group who have the condition by the total number of people in that group.

Prevalence can be expressed as a decimal fraction, a percentage, or a “one out of so many” kind of number. For example, a 2011 survey found that 11.3 percent of the U.S. adult population has diabetes. So the prevalence of diabetes in U.S. adults can be expressed as the decimal 0.113, 11.3 percent, or roughly 1 out of 9.

Because prevalence is a simple proportion, it’s analyzed in exactly the same way as any other proportion. The standard error of a prevalence can be estimated by the formulas in Chapter 9; confidence intervals can be obtained from exact methods based on the binomial distribution or from formulas based on the normal approximation to the binomial distribution (see Chapter 10); and prevalence can be compared between two or more populations using the chi-square or Fisher Exact test (see Chapter 13). For this reason, the remainder of this chapter focuses on how to analyze incidence rates.

Incidence: Counting new cases

remember.eps The incidence of a condition is the rate at which new cases of that condition appear in a population. Incidence is generally expressed as an incidence rate (R), defined as the number of observed events (N) divided by the exposure (E): R = N/E. Exposure is the product of the number of subjects in the population times the interval of time during which new events are being counted. Exposure is measured in units of person-time, such as person-days or person-years, so incidence rates are expressed as the number of cases per unit of person-time. The unit of person-time is often chosen so that the incidence rate will be a “conveniently sized” number.

tip.eps The incidence rate should be estimated by counting events over a narrow enough interval of time so that the number of observed events is a small fraction of the total population studied. One year is narrow enough for diabetes (only 0.02 percent of the population develops diabetes in a year), but it isn’t narrow enough for something like the flu, which 30 percent of the population may come down with in a one-year period.

Suppose that last year, in my city with 300,000 adults, 30 adults were newly diagnosed with diabetes. The incidence of adult diabetes in my city would be calculated as 30 cases in 300,000 adults in one year, which works out to 0.0001 new cases per person-year. To avoid working with tiny fractional numbers like 0.0001 (humans usually prefer to work with numbers between 1 and 1,000 whenever possible), it’s more convenient to express this incidence rate as 1 new case per 10,000 patient-years, or perhaps as 10 new cases per 100,000 person-years. Similarly, say that in my cousin’s city, with 80,000 adults, 20 adults were newly diagnosed with diabetes. The incidence rate would be calculated as 24 cases in 80,000 people in one year, which works out to 24/80,000 or 0.0003 new cases per person-year and is expressed more conveniently as 30 new cases per 100,000 person-years. So the incidence rate in my cousin’s city is three times as large as the incidence rate in my city.

Understanding how incidence and prevalence are related

From the definitions and examples in the preceding sections, you see that incidence and prevalence are two related but distinct concepts. The incidence rate tells you how fast new cases of some condition arise in a population, and prevalence tells you what fraction of the population has that condition at any moment.

You might expect that conditions with higher incidence rates would have higher prevalence than conditions with lower incidence rates, and that tends to be true when comparing conditions that last for about the same amount of time. But there are counter-examples — short-lasting conditions (such as acute infections) may have high incidence rates but low prevalence, whereas long-lasting conditions (diabetes, for example) may have low incidence rates but high prevalence.

Analyzing Incidence Rates

The preceding sections show you how to calculate incidence rates and express them in convenient units. But, as I emphasize in Chapter 9, whenever you report a number you’ve calculated, you should also indicate how precise that number is. So how precise are those incident rates? And is the difference between two incidence rates significant? The next sections show you how to calculate standard errors and confidence intervals for incidence rates and how to compare incidence rates between two populations.

Expressing the precision of an incidence rate

The precision of an incidence rate (R) is usually expressed by a confidence interval (CI). The standard error (SE) of R isn’t often quoted, because the event rate usually isn’t normally distributed; the standard error is usually calculated only as part of the confidence interval calculation.

Random fluctuations in R are usually attributed entirely to fluctuations in the event count (N), assuming the exposure (E) is known exactly — or at least much more precisely than N. So the confidence interval for the event rate is based on the confidence interval for N. Here’s how you calculate the confidence interval for R:

1. Calculate the confidence interval (CI) for N.

Chapters 9 and 10 provide approximate standard error and confidence interval formulas, based on the normal approximation to the Poisson distribution (see Chapter 25). These approximations are reasonably good when N is large (at least 50 events):

9781118553992-eq15001.eps

Better yet, you can get the exact Poisson confidence interval around an event count by using software, such as the online calculator at StatPages.info/confint.html.

2. Divide the lower and upper confidence limits for N by the exposure (E).

The answer is the confidence interval for the incidence rate R.

For the example of 24 new diabetes cases in one year in a city with 80,000 adults, the event count (N) is 24, and the exposure (E) is 80,000 person-years (80,000 persons for one year).The incidence rate (R) is N/E, which is 24 per 80,000 person-­years, or 30 per 100,000 person-years. How precise is the incidence rate?

First find the confidence limits for N. Using the approximate formula, the 95 percent confidence interval (CI) around the event count of 24 is 9781118553992-eq15002.eps, or 14.4 to 33.6 events. Dividing the lower and upper confidence limits of N by the exposure gives 14.4/80,000 to 33.6/80,000, which you can express as 18.0 to 42.0 events per 100,000 person-years — the confidence interval for the incidence rate.

Using the exact online calculator, you get 15.4 to 35.7 as the 95 percent confidence interval around 24 observed events. Dividing these numbers by the 80,000 person-years of exposure gives 19.2 to 44.6 events per 100,000 person-years as the exact 95 percent confidence interval around the incidence rate.

Comparing incidences with the rate ratio

remember.eps When comparing incidence rates between two populations, you should calculate a rate ratio (RR) by dividing one incidence rate by the other. So for two groups with event counts N1 and N2, exposures E1 and E2, and incidence rates R1 and R2, respectively, you calculate the rate ratio for Group 2 relative to Group 1 as a reference, like this:

9781118553992-eq15003.eps

For the example of diabetes incidence in the two cities, you have N1 = 30, E1 = 300,000, N2 = 24, and E2 = 80,000. The RR for my cousin’s city relative to my city is RR = (24/80,000)/(30/300,000), or 3.0, indicating that my cousin’s city has three times the diabetes incidence that my city has.

You could calculate the difference (R2R1) between two incidence rates if you wanted to, but in epidemiology, rate ratios are used much more often than rate differences.

Calculating confidence intervals for a rate ratio

Whenever you report a rate ratio you’ve calculated, you should also indicate how precise that ratio is. The exact calculation of a confidence interval (CI) around a rate ratio is quite difficult, but if your observed event counts aren’t too small (say, ten or more), then the following approximate formula for the 95 percent CI around an RR works reasonably well:

95% CI = RR/Q to RR × Q

where 9781118553992-eq15004.eps.

For other confidence levels, replace the 1.96 in the Q formula with the appropriate critical z value for the normal distribution (see Chapter 26).

So for the diabetes example (where N1 = 30, N2 = 24, and RR = 3.0), 9781118553992-eq15005.eps, so the 95 percent CI goes from 3.0/1.71 to 3.0 × 1.71, or from 1.75 to 5.13.

The calculations are even simpler if you use the nomogram in Figure 15-1. You lay a ruler (or stretch a string) between the values of N1 and N2 on the left and right scales, and then read off the two values from the left and right sides of the center scale. The right-side number is Q; the left-side number is 1/Q. So, if you multiply the observed RR by these two numbers, you have the 95 percent confidence limits. It doesn’t get much easier than that!

For the diabetes example, a ruler placed on 30 and 24 on the left and right scales crosses the center scale at the numbers 0.585 and 1.71, consistent with the preceding calculations.

Comparing two event rates

Two event rates (R1 and R2), based on N1 and N2 events and E1 and E2 exposures, can be tested for a significant difference by calculating the 95 percent confidence interval (CI) around the rate ratio (RR) and observing whether that CI crosses the value 1.0 (which indicates identical rates). If the 95 percent CI includes 1, the RR isn’t significantly different from 1, so the two rates aren’t significantly different from each other (p > 0.05). But if the 95 percent CI doesn’t include 1, the RR is significantly different from 1, so the two rates are significantly different from each other (p < 0.05).

9781118553992-fg1501.eps

Illustration by Wiley, Composition Services Graphics

Figure 15-1: Nomogram to calculate a 95 percent confidence interval around a rate ratio.

For the diabetes example, the observed rate ratio was 3.0, with a 95 percent confidence interval of 1.75 to 5.13, which doesn’t include the value 1. So the rate ratio is significantly greater than 1, and I would conclude that my cousin’s city has a significantly higher diabetes incidence rate than my city (p < 0.05).

This test is very easy to do (requiring no calculations at all!) using the Figure 15-1 nomogram. Just lay the ruler across the N1 and N2 values on the left and right scales, and look at the numbers on the center scale. If your observed RR is lower than the left-side number or higher than the right-side number, the two event rates are significantly different from each other (the RR is significantly different from 1.0), at the p < 0.05 level.

For the diabetes example, a ruler placed on 30 and 24 on the left and right scales crosses the center scale at the numbers 0.585 and 1.71, so any observed rate ratio outside of this range is significantly different from 1, consistent with the foregoing calculations.

Comparing two event counts with identical exposure

If — and only if — the two exposures (E1 and E2) are identical, there’s a really simple rule for testing whether two event counts (N1 and N2) are significantly different at the p < 0.05 level:

If 9781118553992-eq15006.eps, then the Ns are significantly different (p < 0.05).

remember.eps If the square of the difference is more than four times the sum, then the numbers are significantly different (p < 0.05).

The value 4 appearing in this rule is an approximation to 3.84, the chi-square value corresponding to p = 0.05 (see Chapter 25 for more info).

For example, if there were 30 fatal auto accidents in your town last year and 40 this year, are things getting more dangerous, or was the increase just random fluctuations? Using the simple rule, you calculate (30 – 40)2/(30 + 40) = 100/70 = 1.4, which is less than 4. Having 30 events isn’t significantly different from 40 events (during equal time intervals), so the apparent increase could just be “noise.” But had the number of events gone from 30 to 50 events, the jump would have been significant because (30 – 50)2/(30 + 50) = 400/80 = 5.0, which is greater than 4.

Estimating the Required Sample Size

Sample-size calculations for rate comparisons are more difficult than you want to attempt by hand. Instead, you can use the Excel file SampleSizeCalcs.xls, available at www.dummies.com/extras/biostatistics. You may want to review Chapter 3 for a refresher on the concepts of power and sample size.

As in all sample-size calculations, you need to specify the desired statistical power (often set to 80 percent) and the alpha level for the test (often set to 0.05). When comparing event rates (R1 and R2) between two groups, considering R1 to be the reference group, you must also specify

check.png The expected rate in the reference group (R1)

check.png The effect size of importance, expressed as the rate ratio RR = R2/R1 and entered into the spreadsheet as the expected value of R2

check.png The expected ratio of exposure in the two groups (E2/E1), entered into the GroupSize Ratio field of the spreadsheet

When you enter the necessary parameters into the spreadsheet, it will give you the required exposure for each group and the number of events you can expect to see in each group.

For example, suppose you’re designing an experiment to test whether rotavirus gastroenteritis is more prevalent in inner cities than in the suburbs. You’ll enroll an equal number of inner city and suburban residents and follow them for one year to see whether they come down with rotavirus. Say that the incidence of rotavirus in suburbia is known to be 1 case per 100 person-years (an incidence rate of 0.01 case per patient-year). You want to have an 80 percent chance of getting a significant result of p < 0.05 (that is, 80 percent power at 0.05 alpha) when comparing the incidence rates between the two areas if they differ by more than 25 percent (that is, a rate ratio of 1.25).

You’d fill in the fields of the spreadsheet, as shown in Figure 15-2.

The spreadsheet calculates that you need more than 28,000 person-years of observation in each group (a total of almost 57,000 subjects in a one-year study) in order to have enough observed events to have an 80 percent chance of getting a significant result when comparing rotavirus incidence between inner-city and suburban residents. This shockingly large requirement illustrates the difficulty of studying the incidence rates of rare illnesses.

9781118553992-fg1502.eps

Illustration by Wiley, Composition Services Graphics

Figure 15-2: Calculating the sample size required to compare two incidence rates.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset