Chapter 5
Conducting Clinical Research
In This Chapter
Planning and carrying out a clinical research study
Protecting the subjects
Collecting, validating, and analyzing research data
This chapter and the next one provide a closer look at a special kind of biological research — the clinical trial. This chapter describes some aspects of conducting clinical research; Chapter 6 gives you the “big picture” of pharmaceutical drug trials — an example of a high-profile, high-stakes, highly regulated research endeavor. Although you may never be involved in something as massive as a drug trial, the principles are just as relevant, even if you’re only trying to show whether drinking a fruit smoothie every day gives you more energy.
Designing a Clinical Study
Clinical studies should conform to the highest standards of scientific rigor, and that starts with the design of the study. The following sections note some aspects of good experimental design you should keep in mind at the start of any research project.
Identifying aims, objectives, hypotheses, and variables
The aims or goals of a study are short general statements (often just one statement) of the overall purpose of the trial. For example, the aim of a study may be “to assess the safety and efficacy of drug XYZ in patients with moderate hyperlipidemia.”
The objectives are much more specific than the aims. Objectives usually refer to the effect of the product on specific safety and efficacy variables, at specific points in time, in specific groups of subjects. An efficacy study may have many individual efficacy objectives, as well as one or two safety objectives; a safety study may or may not have efficacy objectives.
A typical set of primary, secondary, exploratory, and safety objectives (this example shows one of each type) for an efficacy study might look like this:
Primary efficacy objective: To compare the effect of drug XYZ, relative to placebo, on changes in serum total cholesterol from baseline to week 12, in patients with moderate hyperlipidemia.
Secondary efficacy objective: To compare the effect of drug XYZ, relative to placebo, on changes in serum total cholesterol and serum triglycerides from baseline to weeks 4 and 8, in patients with moderate hyperlipidemia.
Exploratory efficacy objective: To compare the effect of drug XYZ, relative to placebo, on changes in serum lipids from baseline to weeks 4, 8, and 12, in male and female subsets of patients with moderate hyperlipidemia.
Safety objective: To evaluate the safety of drug XYZ, relative to placebo, in terms of the occurrence of adverse events, changes from baseline in vital signs (blood pressure and heart rate), and safety laboratory results (chemistry, hematology, and so on), in patients with moderate hyperlipidemia.
Hypotheses usually correspond to the objectives but are worded in a way that directly relates to the statistical testing to be performed. So the preceding primary objective may correspond to the following hypothesis: “The mean 12-week reduction in total cholesterol will be greater in the XYZ group than in the placebo group.” Alternatively, the hypothesis may be expressed in a more formal mathematical notation and as a null and alternate pair (see Chapters 2 and 3 for details on these terms and the mathematical notation used):
HNull: ΔXYZ – ΔPlacebo = 0
HAlt: ΔXYZ – ΔPlacebo > 0
where Δ = mean of (TCholWeek 12 – TCholBaseline).
Basic demographic information (such as date of birth, gender, race, and ethnicity)
Information about the subject’s participation in the study (for instance, date of enrollment, whether the subject met each inclusion and exclusion criterion, date of each visit, measures of compliance, and final status (complete, withdrew, lost to follow-up, and so on)
Basic baseline measurements (height, weight, vital signs, safety laboratory tests, and so forth)
Subject and family medical history, including diseases, hospitalizations, smoking and other substance use, and current and past medications
Laboratory and other testing (ECGs, X-rays, and so forth) results related to the study’s objectives
Responses from questionnaires and other subjective assessments
Occurrence of adverse events
Some of this information needs to be recorded only once (like birthdate, gender, and family history); other information (such as vital signs, dosing, and test results) may be acquired at scheduled or unscheduled visits, and some may be recorded only at unpredictable times, if at all (like adverse events).
Deciding who will be in the study
Because you can’t examine the entire population of people with the condition you’re studying, you must select a representative sample from that population (see Chapter 3 for an introduction to populations and samples). You do this by explicitly defining the conditions that determine whether or not a subject is suitable to be in the study.
Inclusion criteria are used during the screening process to identify potential subjects and usually involve subject characteristics that define the population you want to draw conclusions about. A reasonable inclusion criterion for a study of a lipid-lowering treatment would be, “Subject must have a documented diagnosis of hyperlipidemia, defined as Total Cholesterol > 200 mg/dL and LDL > 130 mg/dL at screening.”
Exclusion criteria are used to identify subjects for whom participation would be unsafe or those whose participation would compromise the scientific integrity of the study (due to preexisting conditions, an inability to understand instructions, and so on). The following usually appears in the list of exclusion criteria: “The subject is, in the judgment of the investigator, unlikely to be able to understand and comply with the treatment regimen prescribed by the protocol.”
Withdrawal criteria describe situations that could arise during the study that would prevent the subject’s further participation for safety or other reasons (such as an intolerable adverse reaction or a serious noncompliance). A typical withdrawal criterion may be “The subject has missed two consecutive scheduled clinic visits.”
Choosing the structure of the study
Most clinical trials involving two or more test products have one of the following structures (or designs), each of which has both pros and cons:
Parallel: Each subject receives one of the products. Parallel designs are simpler, quicker, and easier for each subject, but you need more subjects. Trials with very long treatment periods usually have to be parallel. The statistical analysis of parallel trials is generally simpler than for crossover trials (see the next bullet).
Crossover: Each subject receives all the products in sequence during consecutive treatment periods (called phases) separated by washout intervals (lasting from several days to several weeks). Crossover designs can be more efficient, because each subject serves as his own control, eliminating subject-to-subject variability. But you can use crossover designs only if you’re certain that at the end of each washout period the subject will have been restored to the same condition as at the start of the study; this may be impossible for studies of progressive diseases, like cancer or emphysema.
Using randomization
Randomized controlled trials (RCTs) are the gold standard for clinical research. In an RCT, the subjects are randomly allocated into treatment groups (in a parallel trial) or into treatment-sequence groups (in a crossover design). Randomization provides several advantages:
It tends to eliminate selection bias — preferentially giving certain treatments to certain subjects (assigning a placebo to the less “likeable” subjects) — and confounding, where the treatment groups differ with respect to some characteristic that influences the outcome.
It permits the application of statistical methods to the analysis of the data.
It facilitates blinding. Blinding (also called masking) refers to concealing the identity of the treatment from subjects and researchers, and can be one of two types:
• Single-blinding: The subjects don’t know what treatment they’re receiving, but the investigators do.
• Double-blinding: Neither the subjects nor the investigators know which subjects are receiving which treatments.
Blinding eliminates bias resulting from the placebo effect, whereby subjects often respond favorably to any treatment (even a placebo), especially when the efficacy variables are subjective, such as pain level. Double-blinding also eliminates deliberate and subconscious bias in the investigator’s evaluation of a subject’s condition.
The simplest kind of randomization involves assigning each newly enrolled subject to a treatment group by the flip of a coin or a similar method. But simple randomization may produce an unbalanced pattern, like the one shown in Figure 5-1 for a small study of 12 subjects and two treatments: Drug (D) and Placebo (P).
Illustration by Wiley, Composition Services Graphics
Figure 5-1: Simple randomization.
If you were hoping to have six subjects in each group, you won’t like having only three subjects receiving the drug and nine receiving the placebo, but unbalanced patterns like this arise quite often from 12 coin flips. (Try it if you don’t believe me.)
A better approach is to require six subjects in each group, but to shuffle those six Ds and six Ps around randomly, as shown in Figure 5-2:
Illustration by Wiley, Composition Services Graphics
Figure 5-2: Random shuffling.
This arrangement is better (there are exactly six drug and six placebo subjects), but this particular random shuffle happens to assign more drugs to the earlier subjects and more placebos to the later subjects (again, bad luck of the draw). If these 12 subjects were enrolled over a period of five or six months, seasonal effects might be mistaken for treatment effects (an example of confounding).
To make sure that both treatments are evenly spread across the entire recruitment period, you can use blocked randomization, in which you divide your subjects into consecutive blocks and shuffle the assignments within each block. Often the block size is set to twice the number of treatment groups (for instance, a two-group study would use a block size of four), as shown in Figure 5-3.
Illustration by Wiley, Composition Services Graphics
Figure 5-3: Blocked randomization.
Selecting the analyses to use
You should select the appropriate method for each of your study hypotheses based on the kind of data involved, the structure of the study, and the nature of the hypothesis. The rest of this book describes statistical methods to analyze the kinds of data you’re likely to encounter in clinical research. Changes in variables over time and differences between treatments in crossover studies are often analyzed by paired t tests and repeated-measures ANOVAs, and differences between groups of subjects in parallel studies are often analyzed by unpaired t tests and ANOVAs (see Chapter 12 for more on t tests and ANOVAs). Differences in the percentage of subjects responding to treatment or experiencing events are often compared with chi-square or Fisher Exact tests (see Chapters 13 and 14 for the scoop on these tests). The associations between two or more variables are usually analyzed by regression methods (get the lowdown on regression in Part IV). Survival times (and the times to the occurrence of other endpoint events) are analyzed by survival methods (turn to Part V for the specifics of survival analysis).
Defining analytical populations
The safety population: This group usually consists of all subjects who received at least one dose of any study product (even a placebo) and had at least one subsequent safety-related visit or observation. All safety-related tabulations and analyses are done on the safety population.
The intent-to-treat (ITT) population: This population usually consists of all subjects who received any study product. The ITT population is useful for assessing effectiveness — how well the product performs in the real world, where people don’t always take the product as recommended (because of laziness, inconvenience, unpleasant side effects, and so on).
The per-protocol (PP) population: This group is usually defined as all subjects who complied with the rules of the study — those people who took the product as prescribed, made all test visits, and didn’t have any serious protocol violations. The PP population is useful for assessing efficacy — how well the product works in an ideal world where everyone takes it as prescribed.
Other special populations may be defined for special kinds of analysis. For example, if the study involves taking a special set of blood samples for pharmacokinetic (PK) calculations, the protocol usually defines a PK population consisting of all subjects who provided suitable PK samples.
Determining how many subjects to enroll
You should enroll enough subjects to provide sufficient statistical power (see Chapter 3) when testing the primary objective of the study. The specific way you calculate the required sample size depends on the statistical test that's used for the primary hypothesis. Each chapter of this book that describes hypothesis tests shows how to estimate the required sample size for that test. Also, you can use the formulas, tables, and charts in Chapter 26 and in the Cheat Sheet (at www.dummies.com/cheatsheet/biostatistics
) to get quick sample-size estimates.
Putting together the protocol
A protocol is a document that lays out exactly what you plan to do in a clinical study. Ideally, every study involving human subjects should have a protocol. The following sections list standard components and administrative information found in a protocol.
Standard elements
A formal drug trial protocol usually contains most of the following components:
Title: A title conveys as much information about the trial as you can fit into one sentence, including the protocol ID, study name (if it has one), clinical phase, type and structure of trial, type of randomization and blinding, name of the product, treatment regimen, intended effect, and the population being studied (what medical condition, in what group of people). A title can be quite long — this one has all the preceding elements:
Protocol BCAM521-13-01 (ASPIRE-2) — a Phase-IIa, double-blind, placebo-controlled, randomized, parallel-group study of the safety and efficacy of three different doses of AM521, given intravenously, once per month for six months, for the relief of chronic pain, in adults with knee osteoporosis.
Background information: This section includes info about the disease (such as its prevalence and impact), known physiology (at the molecular level, if known), treatments currently available (if any), and information about this drug (its mechanism of action, the results of prior testing, and known and potential risks and benefits to subjects).
Rationale: The rationale for the study states why it makes sense to do this study at this time, including a justification for the choice of doses, how the drug is administered (such as orally or intravenously), and the duration of therapy and follow-up.
Aims, objectives, and hypotheses: I discuss these items in the earlier section Aims, objectives, hypotheses, and variables.
Detailed descriptions of all inclusion, exclusion, and withdrawal criteria: See the earlier section Deciding who will be in the study for more about these terms.
Design of study: The study’s design defines its structure (check out the earlier section Choosing the structure of the study), the number of treatment groups, and the consecutive stages (screening, washout, treatment, follow-up, and so on). This section often includes a schematic diagram of the structure of the study.
Product description: This description details each product that will be administered to the subjects, including the chemical composition (with the results of chemical analysis of the product, if available) and how to store, prepare, and administer the product.
Blinding and randomization schemes: These schemes include descriptions of how and when the study will be unblinded (including the emergency unblinding of individual subjects, if necessary); see the earlier section Using randomization.
Procedural descriptions: This section describes every procedure that will be performed at every visit, including administrative procedures (such as enrollment and informed consent) and diagnostic procedures (for example, physical exams and vital signs).
Safety considerations: These factors include the known and potential side effects of the product and each test procedure (such as X-rays, MRI scans, and blood draws), including steps taken to minimize the risk to the subjects.
Handling of adverse events: This section describes how adverse events will be recorded — description, severity, dates and times of onset and resolution, any medical treatment given for the event, and whether or not the investigator thinks the event was related to the study product. Reporting adverse events has become quite standardized over the years, so this section tends to be very similar for all studies.
Definition of safety, efficacy, and other analytical populations: This section includes definitions of safety and efficacy variables and endpoints (variables or changes in variables that serve as indicators of safety or efficacy). See the earlier section Defining analytical populations.
Planned enrollment and analyzable sample size: Justification for these numbers must also be provided.
Proposed statistical analyses: Some protocols describe, in detail, every analysis for every objective; others have only a summary and refer to a separate Statistical Analysis Plan (SAP) document for details of the proposed analysis. This section should also include descriptions of the treatment of missing data, adjustments for multiple testing to control Type I errors (see Chapter 3), and whether any interim analyses are planned. If a separate SAP is used, it will usually contain a detailed description of all the calculations and analyses that will be carried out on the data, including the descriptive summaries of all data and the testing of all the hypotheses specified in the protocol. The SAP also usually contains mock-ups, or “shells” of all the tables, listings, and figures (referred to as TLFs) that will be generated from the data.
Administrative details
A protocol also has sections with more administrative information:
Names of and contact info for the sponsor, medical expert, and primary investigator, plus the physicians, labs, and other major medical or technical groups involved
A table of contents, similar to the kind you find in many books (including this one)
A synopsis, which is a short (usually around two pages) summary of the main components of the protocol
A list of abbreviations and terms appearing in the protocol
A description of your policies for data handling, record-keeping, quality control, ethical considerations, access to source documents, and publication of results
Financing and insurance agreements
Descriptions of all amendments made to the original protocol
Carrying Out a Clinical Study
After you’ve designed your study and have described it in the protocol document, it’s time to set things in motion. The operational details will, of course, vary from one study to another, but a few aspects apply to all clinical studies. In any study involving human subjects, the most important consideration is protecting those subjects from harm, and an elaborate set of safeguards has evolved over the past century. And in any scientific investigation, the accurate collection of data is crucial to the success of the research.
Protecting your subjects
Safety: Minimizing the risk of physical harm to the subjects from the product being tested and from the procedures involved in the study
Privacy/confidentiality: Ensuring that data collected during the study is not made public in a way that identifies a specific subject without the subject’s consent
The following sections describe some of the “infrastructure” that helps protect human subjects.
Surveying regulatory agencies
In the United States, several government organizations oversee human subjects’ protection:
Commercial pharmaceutical research is governed by the Food and Drug Administration (FDA).
Most academic biological research is sponsored by the National Institutes of Health (NIH) and is governed by the Office for Human Research Protections (OHRP).
Chapter 6 describes the ways investigators interact with these agencies during the course of clinical research.
Working with Institutional Review Boards
For all but the very simplest research involving human subjects, you need the approval of an IRB — an Institutional (or Independent) Review Board — before enrolling any subjects into your study. You have to submit an application along with the protocol and an ICF (see the next section) to an IRB with jurisdiction over your research.
Most medical centers and academic institutions — and some pharmaceutical companies — have their own IRBs with jurisdiction over research conducted at their institution. If you’re not affiliated with one of these centers or institutions (for example, if you’re a physician in private practice), you may need the services of a “free-standing” IRB. The sponsor of the research may suggest (or dictate) an IRB for the project.
Getting informed consent
An important part of protecting human subjects is making sure that they’re aware of the risks of a study before agreeing to participate in it. You must prepare an Informed Consent Form (ICF) describing, in simple language, the nature of the study, why it is being conducted, what is being tested, what procedures subjects will undergo, and what the risks and benefits are. Subjects must be told that they can refuse to participate and can withdraw at any time for any reason, without fear of retribution or the withholding of regular medical care. The IRB can usually provide ICF templates with examples of their recommended or required wording.
Considering data safety monitoring boards and committees
For clinical trials of products that are likely to be of low risk, investigators are usually responsible for being on the lookout for signs of trouble (unexpected adverse events, abnormal laboratory tests, and so forth) during the course of the study. But for studies involving high-risk treatments (like cancer chemotherapy trials), a separate data safety monitoring board or committee (DSMB or DSMC) may be set up. A DSMB may be required by the sponsor, the investigator, the IRB, or a regulatory agency. A DSMB typically has about six members (usually expert clinicians in the relevant area of research and a statistician) who meet at regular intervals to review the safety data acquired up to that point. The committee is authorized to modify, suspend, or even terminate a study if it has serious concerns about the safety of the subjects.
Getting certified in human subjects protection and good clinical practice
As you've probably surmised from the preceding sections, clinical research is fraught with regulatory requirements (with severe penalties for noncompliance), and you shouldn't try to "wing it" and hope that everything goes well. You should ensure that you, along with any others who may be assisting you, are properly trained in matters relating to human subjects protection. Fortunately, such training is readily available. Most hospitals and medical centers provide yearly training (often as a half-day session), after which you receive a certification in human subjects protection. Most IRBs and funding agencies require proof of certification from all people who are involved in the research. If you don't have access to that training at your institution, you can get certified by taking an online tutorial offered by the NIH (grants.nih.gov/grants/policy/hs/training.htm
).
You should also have one or more of the people who will be involved in the research take a course in “good clinical practice” (GCP). GCP certification is also available online (enter “GCP certification” in your favorite browser).
Collecting and validating data
If the case report form (CRF) has been carefully and logically designed, entering each subject’s data in the right place on the CRF should be straightforward. Then you need to get this data into a computer for analysis. You can enter your data directly into the statistics software you plan to use for the majority of the analysis (see Chapter 4 for some software options), or you can enter it into a general database program such as MS Access or a spreadsheet program like Excel. The structure of a computerized database usually reflects the structure of the CRF. If a study is simple enough that a single data sheet can hold all the data, then a single data file (called a table) or a single Excel worksheet will suffice. But for most studies, a more complicated database is required, consisting of a set of tables or Excel worksheets (one for each kind of data collection sheet in the CRF). If the design of the database is consistent with the structure of the CRF, entering the data from each CRF sheet into the corresponding data table shouldn’t be difficult.
Have one person read data from the source documents or CRFs while another looks at the data that’s in the computer. Ideally, this is done with all data for all subjects.
Have the computer display the smallest and largest values of each variable. Better yet, have the computer display a sorted list of the values for each variable. Typing errors often produce very large or very small values.
A more extreme approach, but one that’s sometimes done for crucially important studies, is to have two people enter all the data into separate copies of the database; then have the computer automatically compare every single data item between the two databases.
Chapter 7 has more details on describing, entering, and checking different types of data.
Analyzing Your Data
The remainder of this book explains the methods commonly used in biostatistics to summarize, graph, and analyze data. In the following sections, I describe some general situations that come up in all clinical research, regardless of what kind of analysis you use.
Dealing with missing data
Most clinical trials have incomplete data for one or more variables, which can be a real headache when analyzing your data. The statistical aspects of missing data are quite complicated, so you should consult a statistician if you have more than just occasional, isolated missing values. Here I describe some commonly used approaches to coping with missing data:
Exclude a case from an analysis if any of the required variables for that analysis is missing. This approach can reduce the number of analyzable cases, sometimes quite severely (especially in multiple regression, where the whole case must be thrown out, even if only one of the variables in the regression is missing; see Chapter 19 for more information). And if the result is missing for a reason that’s related to treatment efficacy, excluding the case can bias your results.
Replace (impute) a missing value with the mean (or median) of all the available values for that variable. This approach is quite common, but it introduces several types of bias into your results, so it’s not a good technique to use.
If one of a series of sequential measurements on a subject is missing (like the third of a series of weekly glucose values), use the previous value in the series. This technique is called Last Observation Carried Forward (LOCF) and is one of the most widely used strategies. LOCF usually produces “conservative” results, making it more difficult to prove efficacy. This approach is popular with regulators, who want to put the burden of proof on the drug.
Handling multiplicity
Every time you perform a statistical significance test, you run a chance of being fooled by random fluctuations into thinking that some real effect is present in your data when, in fact, none exists. This scenario is called a Type I error (see Chapter 3). When you say that you require p < 0.05 for significance, you’re testing at the 0.05 (or 5 percent) alpha level (see Chapter 3) or saying that you want to limit your Type I error rate to 5 percent. But that 5 percent error rate applies to each and every statistical test you run. The more analyses you perform on a data set, the more your overall alpha level increases: Perform two tests and your chance of at least one of them coming out falsely significant is about 10 percent; run 40 tests, and the overall alpha level jumps to 87 percent. This is referred to as the problem of multiplicity, or as Type I error inflation.
Some statistical methods involving multiple comparisons (like post-hoc tests following an ANOVA for comparing several groups, as described in Chapter 12) incorporate a built-in adjustment to keep the overall alpha at only 5 percent across all comparisons. But when you’re testing different hypotheses, like comparing different variables at different time points between different groups, it’s up to you to decide what kind of alpha control strategy (if any) you want to implement. You have several choices, including the following:
Don’t control for multiplicity and accept the likelihood that some of your “significant” findings will be falsely significant. This strategy is often used with hypotheses related to secondary and exploratory objectives; the protocol usually states that no final inferences will be made from these exploratory tests. Any “significant” results will be considered only “signals” of possible real effects and will have to be confirmed in subsequent studies before any final conclusions are drawn.
Control the alpha level across only the most important hypotheses. If you have two co-primary objectives, you can control alpha across the tests of those two objectives.
You can control alpha to 5 percent (or to any level you want) across a set of n hypothesis tests in several ways; following are some popular ones:
• The Bonferroni adjustment: Test each hypothesis at the 0.05/n alpha level. So to control overall alpha to 0.05 across two primary endpoints, you need p < 0.025 for significance when testing each one.
• A hierarchical testing strategy: Rank your endpoints in descending order of importance. Test the most important one first, and if it gives p < 0.05, conclude that the effect is real. Then test the next most important one, again using p < 0.05 for significance. Continue until you get a nonsignificant result (p > 0.05); then stop testing (or consider all further tests to be only exploratory and don’t draw any formal conclusions about them).
• Controlling the false discovery rate (FDR): This approach has become popular in recent years to deal with large-scale multiplicity, which arises in areas like genomic testing and digital image analysis that may involve many thousands of tests (such as one per gene or one per pixel) instead of just a few. Instead of trying to avoid even a single false conclusion of significance (as the Bonferroni and other classic alpha control methods do), you simply want to control the proportion of tests that come out falsely positive, limiting that false discovery rate to some reasonable fraction of all the tests. These positive results can then be tested in a follow-up study.
Incorporating interim analyses
An interim analysis is one that’s carried out before the conclusion of a clinical trial, using only the data that has been obtained so far. Interim analyses can be blinded or unblinded and can be done for several reasons:
An IRB may require an early look at the data to ensure that subjects aren’t being exposed to an unacceptable level of risk.
You may want to examine data halfway through the trial to see whether the trial can be stopped early for one of the following reasons:
• The product is so effective that going to completion isn’t necessary to prove significance.
• The product is so ineffective that continuing the trial is futile.
You may want to check some of the assumptions that went into the original design and sample-size calculations of the trial (like within-group variability, recruitment rates, base event rates, and so on) to see whether the total sample size should be adjusted upward or downward.
If the interim analysis could possibly lead to early stopping of the trial for proven efficacy, then the issue of multiplicity comes into play, and special methods must be used to control alpha across the interim and final analyses. These methods often involve some kind of alpha spending strategy. The concepts are subtle, and the calculations can be complicated, but here’s a very simple example that illustrates the basic concept. Suppose your original plan is to test the efficacy endpoint at the end of the trial at the 5 percent alpha level. If you want to design an interim analysis into this trial, you may use this two-part strategy:
1. Spend one-fifth of the available 5 percent alpha at the interim analysis.
The interim analysis p value must be < 0.01 to stop the trial early and claim efficacy.
2. Spend the remaining four-fifths of the 5 percent alpha at the end.
The end analysis p value must be < 0.04 to claim efficacy.
This strategy preserves the 5 percent overall alpha level while still giving the drug a chance to prove itself at an early point in the trial.