Chapter 22: Summarizing and Graphing Survival Data

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 22

Summarizing and Graphing Survival Data

In This Chapter

Beginning with the basics of survival data

Trying life tables and the Kaplan-Meier method

Applying some handy guidelines for survival analysis

Using survival data for even more calculations

This chapter describes statistical techniques that deal with a special kind of numerical data — the interval from some starting point in time (such as a diagnosis date or procedure date) to the first (or only) occurrence of some particular kind of endpoint event. Because these techniques are so often applied to situations where the endpoint event is death, we usually call the use of these techniques survival analysis, even when the endpoint is something less drastic than death, like relapse, or even something desirable — for example, time to remission of cancer or time to recovery. Throughout this chapter, I use terms and examples that imply that the endpoint is death (like survival time instead of time to event), but everything I say also applies to other kinds of endpoints.

You may wonder why you need a special kind of analysis for survival data in the first place. Why not just treat survival times as ordinary numerical variables? Why not summarize them as means, medians, standard deviations, and so on, and graph them as histograms and box-and-whiskers charts? Why not compare survival times between groups with t tests and ANOVAs? Why not use ordinary least-squares regression to explore how various factors influence survival time?

In this chapter, I explain how survival data isn’t like ordinary numerical data and why you need to use special techniques to analyze it properly. I describe two ways to construct survival curves: the life table and the Kaplan-Meier methods. I tell you what to watch out for when preparing and interpreting survival curves, and I show you how to glean useful information from these curves, such as median survival time and five-year survival rates.

Understanding the Basics of Survival Data

To understand survival analysis, you first have to understand survival data — that survival times are intervals between certain kinds of events, that these intervals are often affected by a peculiar kind of “partial missingness” called censoring, and that censored data must be analyzed in a special way to avoid biased estimates and incorrect conclusions.

Knowing that survival times are intervals

The techniques described in this chapter for summarizing, graphing, and comparing survival times deal with the time interval from a defined starting point to the first occurrence of an endpoint event. The event can be death; a relapse, like a recurrence of cancer; or the failure of a mechanical component, like a heart valve failure that requires an explant (surgical removal). For example, if a person had a heart valve implanted on January 10, but the body rejected the valve and it had to be removed on January 30, then the time interval from implant to explant is 30 – 10, or 20 days.

A person can die only once, but for other endpoints that can occur multiple times, such as stroke or seizure, the techniques I describe deal with only the first occurrence of the event. More advanced survival analysis methods, which can handle repeated occurrences, are beyond the scope of this book.

The starting point of the time interval is somewhat arbitrary, so it must be defined explicitly for every survival analysis. For example: If you’re evaluating the natural history of some disease, like cancer or chronic obstructive pulmonary disease (COPD), the starting point can be the diagnosis date. If you’re evaluating the efficacy of a treatment, the starting point is often defined as the date the treatment began.

Recognizing that survival times aren’t normally distributed

Even though survival times are continuous or nearly continuous numerical quantities, they’re almost never normally distributed. Because of this, it’s generally not a good idea to use

Means and standard deviations to describe survival times

T tests and ANOVAs to compare survival times between groups

Least-squares regression to investigate how survival time is influenced by other factors

If non-normality were the only problem with survival data, you’d be able to summarize survival times as medians and centiles instead of means and standard deviations, and you could compare survival between groups with nonparametric Mann-Whitney and Kruskal-Wallis tests instead of t tests and ANOVAs. But time-to-event data is susceptible to a special situation called censoring, which the usual parametric and non-parametric methods can’t handle. So special methods have been developed to analyze censored data properly.

Considering censoring

What sets survival data apart from other kinds of numerical data is that, in many studies, you may not know the exact time of death (or other endpoint) for some subjects. This can happen in two general ways:

You may not (and usually don’t) have the luxury of observing every subject until he dies. Because of time constraints, at some point you have to end the study and analyze your data, while some of the subjects are still alive. You don’t know how much longer these subjects will ultimately live; you know only that they were still alive up to the last time you or your colleagues saw them alive (such as at a clinic visit) or communicated with them in some way (such as a follow-up phone call). This is called the date of last contact or the last-seen date.

You may lose track of some subjects during the study. Subjects can drop out of a study or leave town, never to be heard from again, becoming lost to follow-up (LFU). You don’t know whether they’re alive or dead now; you know only that they were alive at the date of last contact, before you lost track of them.

You can describe these two situations in a general way. You know that each subject either died on a certain date or was definitely alive up to some last-seen date (and you don’t know how far beyond that date he may ultimately have lived). The latter situation is called a censored observation.

Figure 22-1 shows the results of a small study of survival in cancer patients after a surgical procedure to remove the tumor. Ten subjects were recruited and enrolled at the time of their surgery, during the period from Jan. 1, 2000, to the end of Dec. 31, 2001 (two years of enrollment). They were then followed until they died or until the conclusion of the study on Dec. 31, 2006 (five years of additional observation after the last enrollment). Each subject has a horizontal timeline that starts on the date of surgery and ends with either the death date or the censoring date.

9781118553992-fg2201.eps

Illustration by Wiley, Composition Services Graphics

Figure 22-1: Survival of ten subjects following surgery for cancer.

Six of the ten subjects (#1, 2, 4, 6, 9, and 10) died during the course of the follow-up study; two subjects (#5 and 7) were lost to follow-up at some point during the study, and two subjects (#3 and 8) were still alive at the end of the study. So this study has four subjects with censored survival times.

So how do you handle censored data like this? The following sections explain the right and wrong ways to proceed.

Dealing with censored data the right way

Statisticians have worked out the proper techniques to utilize the partial information contained in censored observations. I describe two of the most popular techniques later in this chapter — the life-table method and the Kaplan-Meier (K-M) method. To understand these methods, you need to understand two fundamental concepts — hazard and survival:

The hazard rate is the probability of dying in the next small interval of time, assuming the subject is alive right now.

The survival rate is the probability of living for a certain amount of time after some starting time point.

The first task when analyzing survival data is usually to describe how the hazard and survival rates vary with time. In this chapter, I show how to estimate the hazard and survival rates, and how to summarize them as tables and display them as graphs.

Most of the larger statistical packages (such as those in Chapter 4) provide the kinds of calculations I describe, so you may never have to work directly with a life table or perform a Kaplan-Meier calculation. But it’s almost impossible to understand any aspect of survival analysis, from simple descriptive summaries to advanced analytical techniques, without first understanding how these two methods work.

Dealing with censored data the wrong way

Here are two ways not to handle censored survival data:

You shouldn’t exclude subjects with a censored survival time from any survival analysis.

You shouldn’t impute (replace) the censored (last-seen) date with some reasonable substitute value. One commonly used imputation scheme is to replace a missing value with the last observed value for that subject (called last observation carried forward, or LOCF imputation). So you may be tempted to set the death date to the last-seen date for a subject who didn’t die during the observation period.

These techniques for dealing with missing data don’t work for censored data. You can see why in Figure 22-2, in which the timelines for all the subjects have been slid to the left, as if they all had their surgery on the same date. The time scale now shows survival time (in years) after surgery instead of chronological time.

If you simply exclude all subjects with censored death dates from your analysis, you may be left with too few analyzable subjects (there are only six uncensored subjects in this example), which weakens (underpowers) your study. Worse, it will also bias your results in subtle and unpredictable ways.

Using the last-seen date in place of the death date for a censored observation may seem like a legitimate use of LOCF imputation, but it’s not. It’s equivalent to assuming that any subject who isn’t known to have died must have died immediately after the last-contact date. But this assumption isn’t reasonable — some subjects may live many years beyond the date you last saw them. Simply substituting last-seen dates for missing death dates will bias your results toward shorter survival times.

The problem is that a censored observation time isn’t really missing; it’s just not completely known. If you know that a person was last seen alive three years after treatment, you have partial information for that patient. You don’t know exactly what the patient’s true survival time is, but you do know that it’s at least three years.

9781118553992-fg2202.eps

Illustration by Wiley, Composition Services Graphics

Figure 22-2: Survival times from the date of surgery.

Looking at the Life-Table Method

To estimate survival and hazard rates in a population from a set of observed survival times, some of which are censored, you must combine the information from censored and uncensored observations properly. How do you do that? First, forget about trying to get survival estimates simply by dividing the number of subjects alive at a certain time point by the total number of subjects in the study. That approach fails to account properly for the censored observations.

Instead, you have to think of the process in terms of a series of small slices of time, and think of the probability of making it through each time slice, assuming that the subject is alive at the start of that slice. The cumulative survival probability can then be obtained by successively multiplying all these individual time-slice survival probabilities together. For example, to survive three years, first the subject has to make it through Year 1, then she has to make it through Year 2, and then she has to make it through Year 3. The probability of making it through all three years is the product of the probabilities of making it through Year 1, Year 2, and Year 3.

These calculations can be laid out very systematically in a life table, sometimes called an actuarial life table because of its early use by insurance companies. The calculations involve only addition, subtraction, multiplication, and division and are simple enough to do by hand (which is how people did them before computers came along). They can also be set up very easily in a spreadsheet, and many life-table templates are freely available for Excel and other spreadsheet programs.

The following sections explain how to create, interpret, and graph information from a life table.

Making a life table

To create a life table from your survival-time data, first break the entire range of survival times into convenient slices (months, quarters, or years, depending on the time scale of the event you’re studying). You should try to have at least five slices; otherwise, your survival and hazard estimates will be too coarse to show any useful features. Having very fine slices doesn’t hurt the calculations, although the table will have more rows and may become unwieldy. For the survival times shown in Figure 22-2, a natural choice would be to use seven one-year time slices.

Next, count how many people died during each slice and how many were censored (that is, last seen alive during that slice, either because they became lost to follow-up or were still alive at the end of the study). From Figure 22-2, you see that

During the first year after surgery, one subject died (#1), and one subject was censored (#5, who was lost to follow-up).

During the second year, nothing happened (no deaths, no censoring).

During the third year, two subjects died (#4 and 9), and none were censored.

Continue tabulating deaths and censored times for the fourth through seventh years, and enter these counts into the appropriate cells of a spreadsheet like the one shown in Figure 22-3:

Put the description of the time interval that defines each slice into Column A.

Enter the total number of subjects alive at the start into Column B, in the 0–1 yr row.

Enter the counts of people who died within each time slice into Column C (Died).

Enter the counts of people who were censored during each time slice into Column D (Last Seen Alive). Some statisticians prefer to split the censored subject counts into two separate columns — one for those lost to follow-up, and another for those still alive at the end of the study. This practice makes the table a little more informative but isn’t really necessary, because only the total number of censored subjects in each interval is used in the calculations. So it’s a matter of personal preference; in this example, I use a single column for all censored counts.

9781118553992-fg2203.eps

Illustration by Wiley, Composition Services Graphics

Figure 22-3: A life table to analyze the survival times shown in Figure 22-2.

After you’ve entered all the counts, the spreadsheet will look like Figure 22-3. Then you perform the simple calculations shown in the “Formula” row at the top of the spreadsheet to generate the numbers in all the other cells of the table. In the following sections, I go through all the life-table calculations for this example, column by column.

I go through these calculations step by step to show you how they work, but you should never actually do these calculations yourself. Instead, put the formulas into the cells of a spreadsheet program so that it can do the calculations for you. Of course, if you use a preprogrammed life-table spreadsheet (which is even better), all the formulas will already be in place.

Columns B, C, and D

Column B shows the number of subjects known to be alive at the start of each year after surgery. This is equal to the number of subjects alive at the start of the preceding year minus the number of subjects who died (Column C) or were censored (Column D) during the preceding year. Here’s the formula, written in terms of the column letters: B for any year = B – C – D from the preceding year.

Here’s how this process plays out in Figure 22-3:

Out of the ten subjects alive at the start, one died and one was last seen alive during the first year, so eight subjects (10 – 1 – 1) are known to still be alive at the start of the second year. The missing subject (#5, who was lost to follow-up during the first year) may or may not still be alive, so that censored subject isn’t counted in any subsequent years.

Nobody died or was last seen alive during the second year, so eight subjects are still known to be alive at the start of the third year.

Calculations continue the same way for the remaining years.

Column E

Column E shows the number of subjects “at risk for dying” during each year. You may guess that this is the number of people alive at the start of the interval, but there’s one minor correction. If any people were censored during that year, then they weren’t really “available to die” (to use an awful expression) for the entire year. If you don’t know exactly when, during that year, they became censored, then it’s reasonable to “split the difference” and consider them at risk for only half the year. So the number at risk can be estimated as the number alive at the start of the year, minus one-half of the number who became censored during that year, as indicated by the formula for Column E: E = B – D/2.

Here’s how this formula works in Figure 22-3:

Ten people were alive at the start of Year 1, and one subject was censored during Year 1, so there were, in effect, only 9.5 people at risk of dying during Year 1 (1 divided by 2 is 0.5; subtract 0.5 from 10 to get 9.5).

Eight people were alive at the start of Year 2, with none being censored during Year 2, so all eight people were at risk during Year 2.

Calculations continue in the same way for the remaining years.

Column F

Column F shows the probability of dying during each interval, assuming the subject has survived up to the start of that interval. This is simply the number of people who died divided by the number of people at risk during each interval, as indicated by the formula for Column F: F = C/E.

Here’s how this formula works in Figure 22-3:

For Year 1, one death out of 9.5 people at risk gives a 1/9.5, or 0.105 probability of dying during Year 1.

Nobody died in Year 2, so the probability of dying during Year 2 (assuming the subject has already survived Year 1) is 0.

Calculations continue in the same way for the remaining years.

Column G

Column G shows the probability of surviving during each interval, assuming the subject has survived up to the start of that interval. Surviving means not dying, so the probability of surviving is simply 1 – the probability of dying, as indicated by the formula for Column G: G = 1 – F.

Here’s how this formula works out in Figure 22-3:

The probability of dying in Year 1 is 0.105, so the probability of surviving in Year 1 is 1 – 0.105, or 0.895.

The probability of dying in Year 2 is 0.000, so the probability of surviving in Year 2 is 1 – 0.000, or 1.000.

Calculations continue in the same way for the remaining years.

Column H

Column H shows the cumulative probability of surviving from the time of the operation all the way through the end of this time slice. To survive from the time of the operation through the end of any given year (year N), the subject must survive each of the years from Year 1 through Year N. Because surviving each year is an independent accomplishment, the probability of surviving all N of the years is the product of the individual years’ probabilities. So Column H is a “running product” of Column G; that is, the value of Column H for Year N is the product of the first N values in Column G.

Here’s what this looks like in Figure 22-3:

For Year 1, H is the same as G: a 0.895 probability of surviving one year.

For Year 2, H is the product of G for Year 1 times G for Year 2; that is, 0.895 × 1.000, or 0.895.

For Year 3, H is the product of the Gs for Years 1, 2, and 3; that is, 0.895 × 1.000 × 0.750, or 0.671.

Calculations continue in the same way for the remaining years.

Putting everything together

Figure 22-4 shows the spreadsheet with the results of all the preceding calculations.

9781118553992-fg2204.eps

Illustration by Wiley, Composition Services Graphics

Figure 22-4: Life-table analysis of sample survival data.

It’s also possible to add another couple of columns to the life table to obtain standard errors and confidence intervals for the survival probabilities. The formulas aren’t very complicated (they’re based on the binomial distribution, described in Chapter 25), but I’ve omitted them from this simple example. The SEs are calculated for each year’s survival probability and then combined according to the propagation-of-error rules (see Chapter 11) to get the SEs for the cumulative survival probabilities. Approximate CIs are then calculated as 1.96 SEs above and below the survival probability.

Interpreting a life table

Figure 22-4 contains the hazard rates (in Column F) and the cumulative survival probabilities (in Column H) for each year following surgery, based on your sample of ten subjects. Here are a few features of a life table that you should be aware of:

The sample hazard and survival values obtained from a life table are only sample estimates (in this example, at 1-year time slices) of the true population hazard and survival functions.

The slice widths are often the same for all the rows in a life table (as they are in this example), but they don’t have to be. They can vary, perhaps being wider at greater survival times.

The hazard rate obtained from a life table is equal to the probability of dying during each time slice (Column F) divided by the width of the slice, so the hazard rate for the first year would be expressed as 0.105 per year, or 10.5 percent per year.

The cumulative survival probability, in Column H, is the probability of surviving from the operation date through to the end of the interval. It has no units, and it can be expressed as a fraction or as a percentage. The value for any time slice applies to the moment in time at the end of the interval.

The cumulative survival probability is always 1.0 (100 percent) at time 0 (in this example, the time of surgery). This initial value isn’t shown in the table.

The cumulative survival function decreases only at the end of an interval that has at least one observed death. Censored observations don’t cause a drop in the estimated survival, although they do influence the size of the drops when subsequent events occur (because censored events reduce the number of subjects at risk, which is used in the calculation of the death and survival probabilities).

If an interval contains no events at all (no deaths and no censored subjects), like the second year (1–2 years) row in the table, it has no effect whatsoever on the calculations. All subsequent values for B and E through H remain the same as if that row had never been in the table.

Graphing hazard rates and survival probabilities from a life table

Graphs of hazard rates and survival probabilities can be prepared directly from the results of a life table calculation using almost any spreadsheet or program that can make graphs from numerical data. Figure 22-5 illustrates the way these results are typically presented.

Figure 22-5a is a graph of hazard rates. Hazard rates are often graphed as bar charts, because the hazard rates are calculated for (and pertain to) each time slice in a life table.

Figure 22-5b is a graph of survival probabilities. Survival values are usually graphed as stepped line charts, where the survival value calculated in each row of a life table “takes effect” at the end of that row’s time slice. You might think it makes more sense to “connect the dots” with straight-line segments that descend gradually during each year rather than drop suddenly at the end of each year. (I certainly thought so in my early days!) But statisticians have good reasons for graphing survival “curves” as step charts, and that’s how they’re always shown.

9781118553992-fg2205.eps

Illustration by Wiley, Composition Services Graphics

Figure 22-5: Hazard function (a) and survival function (b) results from life-table calculations.

Digging Deeper with the Kaplan-Meier Method

Using very narrow time slices doesn’t hurt life-table calculations. In fact, you can define slices so narrow that each subject’s survival time falls within its own private little slice. With N subjects, N rows would have one subject each; All the rest of the rows would be empty. And because empty rows don’t affect the life-table calculations, you can delete them entirely, leaving a table with only N rows, one for each subject. (If you happen to have two or more subjects with exactly the same survival or censoring time, it’s okay to put each of the subjects in a separate row.)

The life-table calculations work fine with only one subject per row and produce what’s called Kaplan-Meier (K-M) survival estimates. You can think of the K-M method as a very fine-grained life table or a life table as a grouped K-M calculation.

A K-M worksheet for the survival times shown in Figure 22-2, based on the one-subject-per-row idea, looks like Figure 22-6. It’s laid out much like the usual life-table worksheet in Figure 22-5 but with a few differences in the raw data cells and minor differences in the calculations:

Instead of a column identifying the time slices, there are two columns (A and B) identifying the subject and the survival or censoring time, in order from the shortest time to the longest.

Instead of two columns containing the number of subjects who died and the number of subjects who were censored in each interval, you need only one column (C) indicating whether or not the subject in that row died. You use 1 if the subject died and 0 if the subject was censored (alive at the end of the study or lost to follow-up).

The Alive at Start column (D) now decreases by 1 for each subject.

The At Risk column in Figure 22-5 isn’t needed; the probability can be calculated from the Alive at Start column. That’s because if the subject is censored, the probability of dying is calculated as 0, regardless of the value of the denominator.

The probability of dying (Column E) is calculated as E = C/D; that is, by dividing the “Died” indicator (1 or 0) by the number of subjects alive at that time.

The probability of surviving and the cumulative survival (Columns F and G) are calculated exactly as in the life-table method.

Figure 22-7 shows graphs of the K-M hazard and survival estimates from Figure 22-6. These charts were created using the R statistical software, but most statistics software that performs survival analysis can create graphs similar to this. The K-M survival curve in Figure 22-7b is now more fine-grained (has smaller steps) than the life-table survival curve in Figure 22-5b, because the step curve now has a drop at every time point at which a subject died (0.74 years for Subject 1, 2.27 years for Subject 9, 2.34 years for Subject 4, and so on).

9781118553992-fg2206.eps

Illustration by Wiley, Composition Services Graphics

Figure 22-6: Kaplan-Meier calculations.

9781118553992-fg2207.eps

Illustration by Wiley, Composition Services Graphics

Figure 22-7: Kaplan-Meier estimates of the hazard (a) and survival (b) functions.

While the K-M survival curve tends to be smoother than the life-table survival curve, just the opposite is true for the hazard curve. In Figure 22-7a, each subject has his own very thin bar, and the resulting chart isn’t easy to interpret.

Heeding a Few Guidelines for Life Tables and the Kaplan-Meier Method

Most of the larger statistical packages (SPSS, SAS, OpenStat, R, and so on; see Chapter 4) can perform life-table and Kaplan-Meier calculations for you and directly generate survival curves. The process is usually quite simple; you just have to provide the program with two variables: the survival time for each subject, and an indicator of whether that survival time represents the actual time to death or is a censored time. But you still have several golden opportunities to mess things up royally if you’re not careful. Here are some pointers for setting up your data and interpreting the results properly.

Recording survival times the right way

When dealing with intervals between time points, you should enter the actual dates and times of time points and let the computer calculate the intervals between those time points. And when recording the raw data that will ultimately be used in a survival analysis, it’s best to enter all the relevant dates and times — diagnosis, start of therapy, end of therapy, start of improvement or remission, relapse, event (each event if it’s a recurring one), death, last seen date, and so on. Then you’ll be able to calculate intervals between any starting point (diagnosis or treatment, for example) and any event (such as remission, relapse, death, and so forth).

Dates (and times) should be recorded to suitable precision. If you’re dealing with things that happen over the course of months or years (like cancer), you may get by recording dates to the nearest month. But if you’re interested in intervals that span only a few days (such as studying treatments for postoperative ileus), you should record dates and times to the nearest hour. When studying duration of labor, you should record time to the nearest minute. You can even envision laboratory studies of intracellular events where time would have to be recorded with millisecond — or even microsecond — precision!

Most modern spreadsheet, database, and statistical software lets you enter dates and times into a single variable (or a single cell of the spreadsheet). This is much better than having two different variables (or using two columns in a spreadsheet) — one for date and one for time of day. Having a single date/time variable lets the computer perform calendar arithmetic — you can obtain intervals between any two events by simple subtraction of the starting and ending date/time variables.

Recording censoring information correctly

People usually get the survival time variable right, but they may miscode the censoring indicator. The software may want you to use 0 or 1, or any two different numerical or character codes to distinguish actual from censored observations. The most common way is to use 1 if the event actually occurred (the subject died), and 0 if the observation is censored. But you might think that a variable that’s referred to as the “censored” indicator should be 1 if the observation is censored and 0 if it’s not censored.

Bad news: If you code the censoring indicator one way and the software is expecting it another way, the program may mistakenly process all the censored observations as uncensored and vice versa. Worse news: You won’t get any warning or error message from the program; you’ll only get incorrect results. Worst news: Depending on how many censored and uncensored observations you have, the survival curve may not display any sign of trouble — it may look like a perfectly reasonable survival curve for your data, even though it’s completely wrong.

You have to check the software manual very carefully to make sure you code the censoring indicator the right way. Also, check the program’s output for the number of censored and uncensored observations and compare them to your own manual count of censored and uncensored subjects in your data file.

Interpreting those strange-looking survival curves

Survival curves (like those shown in Figures Figure 22-5 and Figure 22-7) look different from most other kinds of graphs you see in biological books and publications. Not only is their stepped appearance unusual, but they also contain several kinds of “artifacts” that can easily confuse people who aren’t familiar with life-table and Kaplan-Meier calculations.

Drops and ticks: The drops in a K-M survival curve occur at every time point where there’s an observed death and only at those time points. The curves do not drop at the times of censored observations. Most statistical software places small, vertical tick marks along the survival curve at every censored time point, so that the graph visibly displays all the censored (ticks) and uncensored (drops) data points.

Indistinguishable curves: When several survival curves are plotted on the same chart, they can be very difficult to tell apart, especially if they’re close together or cross over each other. The individual curves must always be drawn using different colors, different line-widths, or different line-types (solid, dashed, dotted, and so on), or they’ll be almost impossible to distinguish.

Drastic drops in survival: Because the magnitude of each drop in survival depends on the number at risk (in the denominator), which decreases toward the bottom of the life-table or K-M calculation, the size of the drops becomes larger at the right side of the survival chart. An extreme “artifact” of this type occurs if the subject with the longest observed time to event happens to be uncensored (for example, the subject died), in which case the curve will drop to zero, which may be completely misleading.

Deceptive precision: Survival curves are usually less precise than they appear to be, and this can lead to your misjudging whether the curves for two or more groups of subjects are significantly different from each other. I say more about that in Chapter 23, dealing with how to compare survival curves.

Doing Even More with Survival Data

Besides giving you an idea of what a true population survival function looks like (the fraction of subjects surviving over the course of time), life-table and Kaplan-Meier survival curves let you estimate several useful numbers that describe survival. Figure 22-8 shows the same K-M survival curve as Figure 22-7b, but with the Y axis labeled as percent (rather than fraction) surviving, and with annotations showing how to estimate

The median (or other centile) survival time: The survival curve in Figure 22-8 declines to 50 percent survival at 5.18 years, so you can say that the median survival after surgery for this cancer is about 5.2 years. Similarly, the graph indicates that 80 percent of subjects are still alive after about 2.2 years.

The five-year (or other time value) survival rate: You can estimate from Figure 22-8 that the five-year survival rate is 53 percent, and the 2-year survival rate is about 89 percent.

9781118553992-fg2208.eps

Illustration by Wiley, Composition Services Graphics

Figure 22-8: Useful things you can get from a survival curve.

Besides preparing hazard and survival curves, you may want to do other things with your survival data:

Compare survival between two or more groups of subjects. You may want to test whether subjects receiving a certain therapy survive longer than subjects receiving a placebo, whether males survive longer than females, or whether survival decreases with increasing stage or grade of disease. You can perform life-table or Kaplan-Meier calculations on each subgroup of subjects and then plot all the survival curves on the same graph. In Chapter 23, I describe how to test for significant differences in survival between two or more groups of subjects.

Determine whether survival is affected by other factors (called covariates), such as stage of disease, subject age, prior medical history, and so on. And if so, you may want to quantify the size of that effect. You may also want to mathematically compensate for the effects of other variables when you’re comparing survival between treatment groups. For other types of outcome data, this compensation is usually done by regression analysis, and you’ll be happy to know that there’s a special kind of regression designed just for survival outcomes. I describe it in Chapter 24.

Prepare a customized prognosis chart — one that shows the expected survival curve for a particular person based on such factors as age, gender, stage of disease, and so forth. Survival regression can also generate these customized survival curves; I show you how in Chapter 24.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 22: Summarizing and Graphing Survival Data

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 22: Summarizing and Graphing Survival Data