Chapter 3
Design Considerations in the Presence of Missing Data

It should be pointed out that researchers themselves should be blamed in part for the high prevalence of missing data in biomedical, behavioral, and social research. Many missing data problems arise from a poor study design and lack of careful planning. They can be largely avoided if enough attention is given at the beginning of the study. As Benjamin Franklin put it, “an ounce of prevention is worth a pound of cure.” So a small amount of planning ahead can often lead to greatly reduced bias and improved efficiency, sometimes can even salvage an otherwise wasted effort. In this chapter, we outline some design and conduct strategies to avoid or reduce missing data in biomedical research studies. Most of the advice is based on a recent National Research Council report (National Research Council, 2010) and a special report in the New England Journal of Medicine (Little et al., 2012b). More technical information can be found in Little et al. (2012a).

3.1 Design Factors Related to Missing Data

The best advice regarding missing data is probably the one given by R.A. Fisher, “the best solution to handle missing data is to have none.” This may sound intuitive; however, it is very difficult to achieve in practical settings. In fact, for most clinical studies, more often than not there will be missing data. Some researchers might put little effort in study designs and data collection to prevent missing data, since the missing data problem is so pervasive that it is deemed unavoidable. However, we believe that every effort should be made early to reduce missing data. Not only does this retain the most information for the estimation and inference, it also encourages the researchers to think more carefully about their scientific question, study design, data collection, and analysis.

To avoid missing data, it is helpful to understand what factors could affect the availability of data. In the setting of clinical trials, the following factors tend to be highly influential:

  1. The Ease of Assessment and Collection of the Variables: The occurrence of missing values is expected to be lower when the outcome variable is easy to assess, such as mortality (e.g., cardiovascular trials) than when the outcome is more difficult to assess and requires the active participation of patients and/or sophisticated methods of diagnosis.
  2. The Quality of the Definition of the Variables: Subjects respond better to clearly defined outcome measures than those vaguely defined.
  3. The Nature of the Variables: Subjects respond better with respect to less personal or sensitive variables. Variables such as HIV infections, psychiatric disorder, and income are hard to collect.
  4. The Amount of Data Collected on Individuals: Missing data become more frequent when the respondents feel that the data collection process is too burdensome.
  5. The Duration of the Clinical Trials: The longer the follow-up, the greater the likelihood of missing values.
  6. The Therapeutic Indication: Missing values are more frequent in those diseases where the adherence of patients to the study protocol is usually low (e.g., psychiatric disorders).
  7. The Treatment Modalities: For example, surgical versus medical treatment.
  8. Motivation of Participants to Remain in the Study: Patients who are more tolerable to the intervention or more likely to benefit from the study tend to remain within the study.

When planning a study, the researchers need to evaluate the above-mentioned factors and try to come up with ways to ease their impact on creating missing data. Several major difficulties arise as a result of the presence of missing values, and these are aggravated as the number of missing values increases. Thus, it is extremely important to avoid the presence of unobserved measurements as much as possible. Researchers should favor designs that minimize this problem, as well as strengthen data collection regardless of the patient's adherence to the protocol and encourage the retrieval of data, such as continued access to medical records after the patient's dropout. In some circumstances, in particular where retrieved dropout information represents the progression of the patient without (or before) impact of further therapeutic intervention, these data will give the best approximation to the intention to treat (ITT) population.

3.2 Strategies for Limiting Missing Data in the Design of Clinical Trials

As pointed out by Lavori et al. (2008), a good study design may not completely eliminate the problem of missing data; however, it can reduce the amount of missing data or their impact. Subsequently, statistical methods such as those covered in this book can be applied at the analysis stage to make proper estimation and inference. On the other hand, if insufficient attention is paid to missing data at the design stage, then the damage to the data may be so severe that the inferential problems are impossible to be resolved in the statistical analysis phase.

There are design elements for clinical trials that can help to prevent missing data by increasing patient participation, reducing patient dropouts, and improving data collection. We recommend the following design strategies to reduce missing data in clinical trials; many of them have been discussed in National Research Council (2010). Note that much of the advice also applies to observational studies.

  1. Improve Data Collection
    • Focus on the research objectives and only collect data that are absolutely necessary to address the scientific questions. Researchers are often tempted to collect as much information as possible. But too much unnecessary data collection tends to put stress on resources, staff, and participants.
    • Design the data collection carefully to focus on the most pertinent information so that subjects will not be overwhelmed with reporting.
    • Provide clear definitions to each of the variables to avoid ambiguity. Missing data often occur when the respondents are unclear about a question or variable.
    • Consider multiple modes of data collection. In addition to participants reporting, administrative data set, electronic medical records, census data, and online surveys can also be used.
    • For longitudinal or repeated measures studies, give extra flexibility to the data collection schedule.
    • Plan the study period carefully so that it is not unnecessarily long, which can lead to more dropouts. For example, shorten the follow-up period for the primary outcome.
    • Identify variables that are most likely to be missing, and plan to spend extra effort to collect them.
  2. Improve Study Population Selection
    • Select the study population carefully. Target the population that is not adequately served by current treatments and hence has an incentive to remain in the study. Subjects with more severe disease or poor health status are more likely to drop out or not respond. However, be mindful that such an exclusion may limit generalizability of the study findings.
    • Include a run-in period during which all patients are assigned to the active intervention, and after which only those who tolerated and adhered to the therapy undergo randomization. The potential drawbacks are carry-over effects and unblinding of patients to intervention.
  3. Improve Intervention
    • Consider add-on interventions, in which a study treatment is added to an existing treatment, typically with a different mechanism of action known to be effective in the previous studies.
    • Allow the rescue medications that are designated as components of a treatment regimen in the study protocol.
    • Provide continued access to effective treatments after the trial, before treatment approval.

3.3 Strategies for Limiting Missing Data in the Conduct of Clinical Trials

The incidence of missing data varies greatly across studies. Some of this variation is context specific, but in many cases more attention to planning and conduct can substantially reduce the problem. Some of the strategies are listed in the following. Many of the ideas are discussed in more detail in the panel report of Little et al. (2012a,b).

  1. Better Training of Investigators, Staff, and Patients
    • Train investigators and research staff so that they can keep participants in the trial until the end, regardless of whether the participants continue to receive the assigned treatment.
    • Select investigators and clinical research associates with good track record of enrolling and following participants and collecting complete data in the previous trials.
    • Patients also need to be educated about the importance of providing complete data even if they choose not to follow the assigned treatment.
  2. Better Data Tracking
    • Set acceptable target rates of missingness in the data and monitor the progress of the trial with respect to these targets.
    • Such targets and monitoring can also be tied to the research staff's performance evaluation.
    • Apply double or multiple data entry or other data checking methods to identify data issues on an ongoing basis. Identifying missing data early may allow the research staff to recollect data in time.
    • Keep contact information for participants up to date. Collect multiple contact numbers or e-mails during enrollment so that participants are more likely to be located if they move out of the study area.
    • Collect information from participants regarding the likelihood that they will drop out, and use this information to attempt to reduce the incidence of dropout. For example, the insurance status or the employment status tends to have a strong influence on participant's decision to remain in the study.
  3. Better Patient Experience
    • Limit the burden and inconvenience of data collection on the participants. For example, allowing participants to fill out Web-based surveys at their own time and pace may be preferred than paper surveys.
    • Incentives often help, but there could be political, financial, or administrative restrictions.
    • Make the study experience as positive as possible. Sometimes small things such as free parking or friendly greeting from the research staff can make big differences.

3.4 Minimize the Impact of Missing Data

There is no rule regarding the maximum number of missing values that could be acceptable. Although the conventional wisdom is that more than 30% of missingness should raise an alarm to the analyst, the impact of missing data is determined by the missing information rate, which depends on both the sample size and the statistical model for complete data. See Little and Rubin (2002) for a detailed discussion on the definition of the missing information rate.

It is very important when designing the study and specifying the statistical methods to be used, to anticipate the number of missing values likely to be observed in the trial. Experience from exploratory trials and from trials of other products in similar indications should inform expectations for missing data when planning the trial. Careful planning will help specify both a plausible approach to handling missing data and a range of sensitivity analyses that could explore the impact of departures from the expected missing data pattern. Indeed, an estimate of the foreseen and acceptable amount of missing data is highly recommended for the following reasons: first, this may have repercussions for the variability and the expectations of the effect size and hence the sample size calculation; second, proper planning will minimize the risk that the strategy for missing data handling itself introduces bias; and third, the uncertainty in interpreting the results introduced increases (and hence the number of sensitivity analyses required may need to increase as the number of missing values increases).

There is no universally applicable method of handling missing values, and different approaches may lead to different results. To avoid concerns over data-driven selection methods, it is essential to prespecify the selected methods in the statistical section of the study protocol or analysis plan. This section must include a detailed description of the selected methods and a justification on why the methods to be applied are expected to be an appropriate way of summarizing the efficacy results of the study and to result in the absence of bias in favor of the experimental treatment. The sensitivity analyses to be performed should also be prespecified.

When missing data are unavoidable, there are ways to reduce their impact on the inference:

  • Thoroughly consider all possible factors that may result in missing values.
  • Collect auxiliary data that could help imputation of missing values.
  • Collect secondary outcomes that capture similar information as primary outcomes, but are easier to collect.
  • Where possible, outcome data after withdrawal should be collected. Also, data should be collected on other therapies received post dropout. Specifically, full details of the type of therapy given, including when and for how long it was used and at what dose, should be collected. This information will allow the value of any outcome data collected after withdrawal to be put into context.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset