Chapter 14: Using Real World Data to Examine the Generalizability of Randomized Trials

14.1 External Validity, Generalizability and Transportability

14.2 Methods to Increase Generalizability

14.3 Generalizability Re-weighting Methods for Generalizability

14.3.1 Inverse Probability Weighting

14.3.2 Entropy Balancing

14.3.3 Assumptions, Best Practices, and Limitations

14.4 Programs Used in Generalizability Analyses

14.5 Analysis of Generalizability Using the PCI15K Data

14.5.1 RCT and Target Populations

14.5.2 Inverse Probability Generalizability

14.5.3 Entropy Balancing Generalizability

14.6 Summary

References

14.1 External Validity, Generalizability and Transportability

Research studies are usually conducted with the final aim of applying the results to individuals beyond those participating in the study. In order to achieve this goal, the study needs to attain both good internal validity (are the results of the study true for the participants in the study?) and external validity. External validity is the degree to which we can apply the results of the study to individuals other than the ones included in the study (Murad et al. 2018).

The external validity of a study depends on multiple factors, including the type of patients enrolled in the study, how treatments are implemented, and the outcome measure. Regarding patients, inclusion and exclusion criteria can be restricting or apply to a broad number of patients. A given treatment might be more effective for some types of patients or may be less appropriate in more severe patients or patients with co-morbidities. The treatment setting and implementation of treatments may vary, which can influence adherence and the effectiveness of a medication.

A common criticism of RCTs is low external validity. RCTs are typically designed to maximize internal validity by minimizing bias (selection bias via randomization, information bias via blinding) and variability (very specific patient populations with exclusion of those very severe or with comorbidities, strict treatment protocols, and exclusion of use of concomitant medications). These methodological characteristics maximize the power of the study to detect a treatment difference if one exists, but at the same time limit the direct generalization of findings. A literature review conducted by Kennedy-Martin et al. (2015)mental health, and oncology. A range of databases were interrogated (MEDLINE; EMBASE; Science Citation Index; Cochrane Methodology Register showed that the majority of RCTs include samples of patients that are not broadly representative of real world patients. Specifically, Kennedy-Martin et al. reviewed studies that assessed external validity by comparing the patient samples included in clinical trials of pharmaceutical interventions with patients from everyday clinical practice. They identified 52 studies, 20 in cardiology, 17 in mental health, and 15 in oncology. In 71.2% of the studies, the individual study authors concluded that RCT samples were not representative of patients encountered in clinical practice or that population differences could have a relevant impact on the external validity of the RCT findings. For example, in cardiology, patients encountered in everyday practice were more likely to have higher risk characteristics such as being older, more likely to be female and more likely to have clinical impairment and co-morbid diseases. The systematic exclusion of more severe patients from RCTs was also present in mental health and oncology.

On the other hand, observational research is considered to have greater external validity than RCTs. Observational research tends to include samples of patients that are more representative of the overall population of patients with a given disorder and the circumstances of the treatment are more similar to routine clinical care. In observational research, inclusion criteria are often very wide and exclusion criteria are usually very limited, thus the generalizability of findings to the general population is much more valid.

We can apply, or generalize, the study results to those individuals from the population from which the sample was drawn. Generalizability is a statistical concept that can be predicted and estimated using sampling theory and tested by assessing the representativeness of the sample that was included in the study compared to the population of origin. If the study has been properly designed and conducted, we can generalize the findings of the study to the population that fulfills the inclusion and exclusion criteria of the study.

Westreich et al. (2017) state “unless the study sample was sampled at random from the target population, there is no expectation of exchangeability of the study sample and the target population.” Yet, as we have seen above, most study samples have not been sampled at random from the eligible population for reasons of either design (for example, a trial might exclude more severe people because they might be more difficult to be treated in the experimental situation) or implementation (clinicians may prefer to enroll patients who will be expected to be compliant with the treatment; individuals with some personal characteristics may be less prone to accept participating in a study). If this is the case, although we might have results with high internal validity, the results from the individuals included in the study may differ from the original population. In these cases, despite having an internally unbiased sample average treatment effect, the treatment effect from the study sample can differ from the treatment effect in the target population of interest. Generalizability is the extent to which we can apply internally valid study results to the target population.

This chapter will present methods to assess the generalizability of study findings and how to estimate the effects of treatment in target populations when sampling has not been random. A common setting is the need to assess the generalizability of an RCT to a target population that is based on a real world study population or database. This setting commonly occurs at the introduction of a new drug to the marketplace. Treatment effect estimates from recent RCTs are available, but there is little or no data on how the new medication performs in a real world or general population. The combination of the information from RCTs and real world data can be used to assess the generalizability of the RCT evidence. Given the importance of this issue (Mumford and Schisterman 2019), generalizability methods were a part of the GetReal project from the Innovative Medicines Initiative (IMI). The IMI is a European public-private partnership to improve drug development. Thus, readers are referred the IMI GetReal website (https://www.imi-getreal.eu/) as additional publications and tools come forth from this collaboration.

In some cases, we might want to extrapolate the study findings to a population that is partially or completely non-overlapping with the target population from which the sample of the study was drawn. Although this problem is related to the generalizability problem, it is usually called transportability (Westreich et al. 2017) and is not discussed further here. Dahebreh and Hernan (2019) define generalizability as extending inferences from the trial to the population of trial eligible population (or a subset therof) and transportability as drawing inferences on a population at least partially outside of the trial eligible sample.

14.2 Methods to Increase Generalizability

The best method to ensure representativeness of the results of a clinical trial is to enroll a probabilistic sample from the target population. However, this requires a listing of the target population, a method to select and enroll a probabilistic sample of participants, and, more importantly, achieving a very high participation rate among those chosen. This is not feasible except in very few cases. In all other circumstances, investigators might need to perform additional analyses to know whether the results of a given trial are generalizable to the target population.

The “Reach Effectiveness Adoption Implementation Maintenance” (Re-AIM) framework by Green and Glasgow (2006) is a widely used model to analyze generalizability of study findings. This model maps summary measures of external validity at the individual level and the setting or organizational level. For example, at the individual level, the model assesses the participation rate or the representativeness, the differential subpopulation response, and/or the maintenance of the response. At the setting or organizational level, the model comprises aspects such as adoption and implementation issues. However, this chapter will focus on the statistical methods to assess and improve generalizability by focusing on patient-level characteristics.

The simplest method to generalize estimated population treatment effects from the results of a clinical trial to a target population is post-stratification adjustment. In post-stratification adjustment, we independently estimate the treatment effects for several strata of the population, which have been created because they are known factors that modify the treatment effect. Then we use population proportions to estimate the treatment effect in the target population by applying the specific treatment effect for each stratum to the proportion of individuals in that stratum. This has been named “direct adjustment.” As an example, imagine that menopausal status affects the response to a treatment for women and that our sample has a higher proportion of post-menopausal women than the general population. In this case, we might want to estimate the treatment effect independently for pre- and post-menopausal women and use the proportion of women in each stratum from the target population to calculate the population effect. Although this method is very simple, it can only be applied if the number of variables to be controlled are small since strata geometrically increase with number of variables and categories and a sufficient number of individuals must be available in each stratum to reliably estimate the within-strata treatment effects.

A straightforward extension of the direct adjustment approach when there are many factors that can influence outcome is re-weighting methods similar to those presented in Chapter 8. Over the last couple of decades, re-weighting methods have been proposed and are now increasingly being used to quantitatively assess the representativeness of study results. Beginning with the next section, the remainder of the chapter presents the use of re-weighting methods.

Though not discussed further in this book, a quick summary of other methods including evidence synthesis is provided here in the context of generalizability. In cases where there are multiple studies analyzing the effects of a given treatment in the same or similar populations, meta-analysis and research synthesis approaches can be used. We present them here since they are often viewed as increasing the generalizability of results. As they integrate evidence from different studies and increasing the number of studies that generate the evidence increases the variability of the samples and treatment conditions, these methods potentially produce more internally and externally valid results. Meta-analytical techniques can be used to summarize the results of a number of studies into an overall treatment effect estimate (Cooper and Hedges 2009). Meta-analytic techniques estimate the overall treatment effect either by assuming that the treatment effect is the same across studies (fixed effects model) or that the real treatment effects may differ among studies (random effects model). Meta-analytic techniques can take into account not only the different sample sizes of the studies that are included but also methodological issues related to their internal validity. However, the generalizability of findings from the meta-analysis will be contingent on the studies that are included in it finally representing the diversity of the target population.

Cross-design synthesis (Kaizar 2011), which shares similarities with meta-analysis, is another approach with the potential to more properly produce generalizable findings. Cross-design synthesis combines the results from studies that have complementary design, namely clinical trials and observational studies (Droitcour et al. 1993). The basic steps of cross-design synthesis are to assess existing randomized studies for generalizability across the full range of patients and settings; assess the existence of observational studies or other type of studies that could be informative of the patients with the disorder; adjust the results of the randomized trials, compensating biases as required; and synthesize all study results. Research synthesis does this by modeling multiple parameters from each study and incorporating study characteristics into the model, including, for example, beliefs regarding the relative merits of the multiple sources of evidence (Stuart et al. 2015). However, further consensus on standards for cross-design synthesis is needed.

Lastly, Didden et al. (2018) demonstrated a regression-based approach to generalizability that is applicable when one can identify another treatment on the market that used in a population similar to that of the new treatment of interest. By building a model with both the RCT data with the new treatment and real-world data with the existing treatment, they show how to estimate the effect of the new treatment in a simulated target population of interest.

14.3 Generalizability Re-weighting Methods for Generalizability

In brief, re-weighting methods for generalizability simply re-weight the patients in the RCT such that the weighted population mimics a target population of interest. This is a two-step process: 1) obtain weights for the RCT patients to match the target population characteristics, and 2) estimate the treatment difference using the weighted RCT data. The weighted analysis is critical due to the potential for heterogeneous treatment affects across the population. That is, the treatment effect is different in different types of patients.

Two different approaches for obtaining the individual patient weights are presented in this chapter. First, the use of inverse probability weighting (Stuart et al. 2011) uses combined data from the RCT and the target population sample to estimate patient weights in a similar fashion as inverse propensity score weighting presented in Chapter 8. This produces a weighted population with means for each covariate that are close to the means in the target population. Second, entropy balancing is an algorithm that searches for a set of patient weights that produces exact balance with the target population across a set of covariates.

Both approaches require that data on all potential effect modifiers are available from both the RCT and from the target population. The inverse probability weighting also typically requires that individual level patient data is available from both the RCT and the target population. However, the entropy balancing approach, while needing individual patient level data from the RCT, only requires that summary level data is available from the target population. Neither approach requires the outcome data in the real world data source (target population) and are thus applicable in the setting of evaluating generalizability at the time of the launch of a new medication.

14.3.1 Inverse Probability Weighting

Cole and Stuart (2010) introduced the concept of using inverse probability of selection weighting to assess the generalizability of an AIDS prevention trial. In this setting, the RCT was considered as a subsample from the broader target population as quantified by the Centers for Disease Control and Prevention data on individuals with HIV. Specifically, patients in the trial were re-weighted using the stabilized version of the inverse probability of selection into the RCT.

This concept was broadened and formalized by Stuart et al. (2011) who proposed the use of inverse propensity weighting as a tool for generalizability. Pressler and Kaizar (2013) also followed the concept of inverse propensity score weighting with the use of observational data as the target population to assess generalizability. The propensity score concept here is similar to that introduced in Chapter 4, except that the probability is not the probability of a particular treatment but rather the probability of being enrolled in the RCT. The models that estimate the propensity scores should include the known effect modifiers that we have information about in the sample of our study and target population.

Following the notation of Lin et al. (to appear), the goal is to estimate the average treatment effect (ATE) in the target population (TATE):

where is the target population sample size and represents all individuals in the target population.

The estimand is the difference in expected values for each treatment in the target population:

where is a p-dimensional vector of patient characteristics, and denotes subjects from the target population. With a data set combining the RCT and the target population, one can estimate the probability of a patient being in the RCT (denoted by ). Then the stabilized and re-scaled weights for each patient in the RCT are defined as follows:

The estimated TATE can then be written as

where is the outcome variable measured in the RCT and is an indicator of active treatment. One can perform inference by using a weighted regression model in SAS (accounting for the weighting by use of SURVEYREG or the sandwich estimator from GENMOD) or bootstrapping for the variance calculation. Thus, the analysis is a two-step process of estimating the weights for each patient and then performing a weighted treatment comparison from the RCT data.

As with the inverse propensity score analyses in Chapter 8, the potential exists for non-overlapping propensity score distributions between the RCT and target population individuals and/or extreme weights. This will result in very large variances and inability to draw meaningful inferences from the data. Stuart et al. (2011) proposed the use of the propensity score as a measure of similarity of the RCT and target populations. Program 5.1 from Chapter 5 provides this feasibility assessment, including the propensity score and other balance measures and can help assess whether credible generalizability estimation is possible. In addition, the effective sample size (ESS = (∑wi)2/∑wi2) serves as a good summary of the impact of re-weighting on power and precision and thus is included in the output from the SAS code in Section 14.4.

Lin et al. (to appear) examined the bias variability tradeoff from trimming weights in the context of generalizability analyses. In a simulation, they examined multiple trimming approaches to examine the bias variability tradeoff under a variety of scenarios. In brief, the best trimming options depend on both the amount of heterogeneity in treatment effect and the level of selection bias between the RCT and target population. Thus, strategies allowing flexible trimming strategies based on the data were recommended. Stuart et al. (2011) also suggested the creation of weights from propensity stratification and full matching as an approach to avoid extreme weights.

14.3.2 Entropy Balancing

This section shows how the entropy balancing algorithm (Hainmueller 2010) can be used to assess the generalizability of RCT evidence. The steps are the same as for Section 14.2.2, except is computed by the entropy balancing algorithm. Entropy balancing was introduced and demonstrated in Chapter 8 in the context of producing a causal treatment effect estimate for comparative effectiveness analyses. The formula and code are provided in Chapter 8. Thus, only a brief introduction is given here.

Instead of estimating weights using a regression function as in the previous section, entropy balancing weights are simply computed by minimizing a loss function subject to a number of constraints. The algorithm finds a set of weights (one for each patient in the RCT) such that the following constraints are met: (1) the weighted average of the mean for each specified covariate is equal to the mean value for that covariate in the target population (and optionally for the variances or higher order moments as well), (2) the weights are positive, and (3) the sum of the weights equals N. The algorithm is also designed to penalize weights that differ from a balanced set of weights (zero loss when all weights are 1).

Thus, the “balancing” produces a weighted RCT population such that the baseline characteristics of the weighted RCT population is the same as the baseline characteristics of the target population. Treatment comparisons in this weighted RCT population are conducted using a weighted analysis and will represent an estimate of the treatment effect had the trial been conducted in a broader patient population.

While the individual patient weights do not have any meaningful interpretation as inverse probability weighting, entropy balancing offers several potential advantages:

● There is no need for patient level data from the real world data population – all that is needed is RCT patient-level data plus the summary statistics from the real world population.

● The algorithm produces a weighted RCT population that exactly balances the means (and higher moments) to the target population of interest (based on data or hypothetical).

● There is no need for rescaling weights as the algorithm is designed such that the sum of the weights is N.

Both methods for computing weights, inverse probability and entropy balancing, are implemented using SAS in Section 14.5.

14.3.3 Assumptions, Best Practices, and Limitations

The generalizability of any research findings is often ignored or a simple qualitative statement about the representativeness of the research sample is made based on its baseline characteristics. This chapter has presented two approaches for quantifying the generalizability by estimating the treatment effect in the target population of interest.

As with the causal inference methods of Chapter 4–9, the validity of the generalizability analyses depends on several key assumptions. Understanding and evaluating these assumptions is a fundamental step in the proper application of weighting methods (Austin and Stuart 2015) and leads to a number of practical recommendations to ensure that the methods are properly used.

The first assumption is positivity, that is, each patient in the RCT has a positive (hypothetical) probability of being in the target population. Lin et al. (to appear) avoided difficulties with this assumption by excluding individuals from the target population that could not be enrolled in the RCT. For instance, if the presence of diabetes was an exclusion criteria for the RCT, patients with diabetes would be removed from the target population database. While researchers might wish to avoid such exclusions, having such patients in the target population database changes the problem to one of transportability of the findings instead of generalizability.

Exchangeability is a second key assumption. That is, the expected outcomes of the people that enrolled in the RCT is equal to the potential outcomes of individuals in the target population but not the RCT (given a set of covariates). This is also referred to as unmeasured confounders assumption. If the RCT and target population differ on some unmeasured factor that is related to outcome, then this assumption is violated. For instance, if more adherent patients enrolled in the RCT (and propensity for adherence is not measured), then adherence in the target population would be lower and could result in poorer outcomes than in the RCT. Thus, it is critical that information about all modifiers of the outcome are available from both data sources. Constructing a directed acyclic graph (DAG), as described in Chapter 4 for propensity modeling, can be a good practice here as well. This will help identify variables expected to be treatment modifiers (whose exclusion would lead to unmeasured confounding) and provide guidance to avoid including variables that do not influence the outcome and could simply reduce the effective sample size. Of course, the presence of unmeasured confounding cannot be formally tested though some work has been done in order to estimate the potential influence of unobserved moderators in the RCT generalization problem (Nguyen et al. 2018). Chapter 13 addresses the unmeasured confounding issue in more detail.

While not fully an unmeasured confounding problem, it is possible that a construct is measured in both the RCT and target population study but is measured differently. For instance, disease severity might be measured by the patient on a 1–10 scale in the RCT but only mild, moderate, and severe categories in the target population. Similarly, information about comorbidities might be directly captured in one and indirectly in the other. When possible, pre-work should be done to harmonize variables to minimize any differences.

Several of the diagnostic tools for comparative effectiveness can be implemented for generalizability analyses as well. For instance, the feasibility and balance assessment programs of Chapter 5 can identify key differences between the RCT and target population and help with the positivity assessment. Cole and Hernan suggested a method to test the positivity assumption by calculating the mean and maximum weights and the mean and standard deviation of the stabilized weights (Cole and Hernán 2008). They suggest that if the mean of the stabilized weights is far from one or if there are very extreme values, then this could be an indication of non-positivity or that the propensity score model has been mis-specified.

Literature on the variance estimation for weighted analyses in the standard comparative effective analysis scenario recommends the use of the sandwich estimator or bootstrapping. Thus, in the code in Section 14.4 we utilize the sandwich estimator. There is little evaluation in the literature specifically regarding the variance of the weighted generalization estimators. However, research is ongoing (Chen and Kaizar 2018). For instance, bootstrapping can easily be implemented as in Chapter 8 as well, though decisions would need to be made on resampling only from the RCT or both sets of data. Also, approaching the problem from a survey methods point of view leads to the use of the SURVEYREG and code is provided in a technical note in Section 14.4.

Lastly, this is a relatively new application area for weighting methods and best practices are evolving. For instance, current recommendations are to compute the weights for the entire RCT population because randomization should protect against differences in covariates between treatment groups within the RCT. However, one could separately re-weight each treatment group in the RCT to the target population. Such practices have not been fully evaluated.

14.4 Programs Used in Generalizability Analyses

Program 14.1: Inverse Probability Weighting for Generalizability

**************************************************************************

* Generalizability Using IPW                                             *

*************************************************************************;

* As described in Section 14.5, this code is based on the PCCI15K dataset and requires *

* the data is in a one observation per patient format and contains data from both the  *

* RCT and the target population. COMBINED is a dataset containing the RCT and target   *

* population data with 1) common covariates:  Stent, Female, Diabetic, AcuteMI, Height,*

* EJFract, 2) a flag (RCTflag) indicating whether the patient is from the RCT (a value *

* of 1) or not (a value of 0), 3) and the outcome variable (Cardcost). COMBINED0      *

* is the subset of COMBINED containing the target population plus the control group   *

* from the RCT.  Similarly, COMBINED1 is the subset dataset with the treated group of *

* RCT patients in addition to the target population.                                  *;

* Step 1:  Use PSMATCH to compute the weights for every patient in the RCT.            *

*  PSMATCH computes the probability of being in the RCT sample from a combined         *
*  dataset with both RCT and Target population patients. Weights are output into a     *
*  separate dataset for computations of weights (target population patients assigned a *
*  weight of 1. This is done separately for each treatment group.                     *;

proc psmatch data=combined0;

  class rctflag Stent Female Diabetic AcuteMI;

  psmodel rctflag(Treated = ‘1’) = height ejfract Stent Female Diabetic AcuteMI
     ves1proc;

  output out=ps0 ps=pr_rct0;  run;

proc psmatch data=combined1;

  class rctflag Stent Female Diabetic AcuteMI;

  psmodel rctflag(Treated = ‘1’) = height ejfract Stent Female Diabetic AcuteMI
        ves1proc;

  output out=ps1 ps=pr_rct1;  run;

data ps1;

  set ps1;

  if RctFlag = 1 then wt = (1 - pr_rct1) / pr_rct1;

  if RctFlag = 0 then wt = 1;

  dum = 1; run;

data ps0;

  set ps0;

  if RctFlag = 1 then wt = (1 - pr_rct0) / pr_rct0;

  if RctFlag = 0 then wt = 1;

  dum = 1; run;

* Step 2: Rescale Weights such that the sum of weights is the N for the RCT *;

* part a);

  proc means data=ps1;

    where RctFlag = 1;

    var wt;

       output out=mn1 sum=sumwt mean=meanwt n=nwt;

    title ‘Summary of weights before re-scaling’; run;

  proc print data=mn1;

    title ‘testing output from Means procedure’; run;

  data mn1;

    set mn1;

    dum=1;

    keep dum meanwt;

  proc sort data=mn1; by dum; run;

  proc sort data=ps1; by dum; run;

  data ps1_rct;

    merge ps1 mn1;

       by dum;

       if RCTflag = 1;

    rwt = wt / meanwt;

       inv_rwt = 1 / rwt;

       wy = rwt*cardcost;

       rwtsq = rwt*rwt;

       run;

* part b (control RCT patents);

  proc means data=ps0;

    where RctFlag = 1;

    var wt;

       output out=mn0 sum=sumwt mean=meanwt n=nwt;

    title ‘Summary of weights before re-scaling’; run;

  proc print data=mn0;

    title ‘testing output from Means procedure’; run;

  data mn0;

    set mn0;

    dum=1;

    keep dum meanwt;

  proc sort data=mn0; by dum; run;

  proc sort data=ps0; by dum; run;

  data ps0_rct;

    merge ps0 mn0;

       by dum;

       if RCTflag = 1;

    rwt = wt / meanwt;

       inv_rwt = 1 / rwt;

       wy = rwt*cardcost;

       rwtsq = rwt*rwt;

       run;

* merge back two parts of RCT sample;

data ps_rct;

  set ps1_rct ps0_rct; run;

proc means data=ps_rct n mean std min p10 median p90 max;

    class thin;

    var rwt;

    title ‘Distribution of Rescaled Weights for RCT patients by treatment group’; run;

* Compute the Effective Sample Size *;

proc means data=ps_rct n mean sum std;

  var rwt rwtsq;

  output out=ESS sum = sum_rwt sum_rwtsq;

  title ‘ESS calculations’; run;

data ess;

  set ess;

  effss = (sum_rwt**2) / sum_rwtsq;

proc print data=ess;

  title ‘testing ESS’; run;

  

* Step 3: Estimated Re-Weighted Treatment Effect from the RCT *;

proc genmod data=ps_rct;  

    weight rwt;  

    class trt patid;  

    model Cardcost = trt;  

    repeated subject=patid; * REPEATED added to get “sandwich”

                              error estimation;  

    lsmeans trt / pdiff;  

    title ‘IPW Weighted Analysis using GENMOD with sandwich error estimation’;    

run;

proc genmod data=ps_rct;  

    weight rwt;  

    class trt patid;  

    model Cardcost = trt height ejfract stent diabetic acutemi female;    

    repeated subject=patid; * REPEATED added to get “sandwich”

                              error estimation;  

    lsmeans trt / pdiff;  

    title ‘IPW Weighted Analysis using GENMOD with sandwich error estimation’;    

run;

* rename dataset if running later EB code after this code as a single program;

data ps1_rct;

  set ps_rct; run;

Technical Note: One can consider the use of PROC SURVEYREG instead of GENMOD for the weighted regression analysis. The following code produces similar results to the GENMOD analysis:
proc surveyreg data=ps_rct;
  class trt;
  weight rwt;
  model Cardcost = trt / solution;
  lsmeans trt/pdiff;
  run;

Program 14.2: Entropy Balancing for Generalizability

*************************************************************************

* Generalizability Using Entropy Balancing                               *

***************************************************************************

* Note: the Macro call below uses 3 datasets: 1) TARGET is a subset of the COMBINED *

* dataset only including target population patients, 2) T0 contains only the Control*

* group patients from the RCT, 3) T1 contains only the Treated group from the RCT.  *

* To run the entropy algorithm using the combined RCT population (not by treatment  *

* group then the dataset T0T1 (both the T0 and T1 patients combined) is needed.     *

* First compute individual patient weights for each RCT patient such that the weighted RCT sample baseline variables will match (on first and 2nd moments) the target population. Note that the entropy balancing macro from Chapter 8 is used in this code. * ;                                          

* This first call to the entropy balancing macro is for the control group;

title1 “entropy balancing to generate weights for the Control group”;

%ebc( caseinpds= Target /* input dataset with cases */

    , cntlinpds= T0/* input dataset with controls */

    , covlistn= ejfract height /* list of continuous covariates */

    , covlistc= stent female diabetic acutemi /* list of categorical covariates */

    , idlistn= patid /* list of numerical ID variables */

    , idlistc=  /* list of character ID variables */

    , baseinpds= /* input dataset with base weights (optional) */

    , outds=t0w /* output dataset with controls and their calculated weights */

    , covnexc2= /* list of continuous covariates to be excluded from
              2nd moment balance (optional) */

    , solve=with nlp /* solver to be used - see proc optmodel */

    , minx=sum {i in indx} w[i]*log(w[i]/q[i]) /* objective function to minimize */

    , wbnd=1e-10 /* minimum weight allowed */

    , pres=aggressive /* type of preprocessing: see proc optmodel */

    , mom1=Y /* Y if 1st moment (i.e. mean) of covariates to be balanced */

    , mom2=Y /* Y if 2nd moment (i.e. variance) of covariates to be balanced */

    , logwchart=Y /* Y if log(w) chart to be produced */

    , debug=N

    , verbose=N);  run;

title1;

* This 2nd call to the entropy balancing macro is for the treated group;

title1 “entropy balancing to generate weights for the Treated group”;

%ebc( caseinpds= Target /* input dataset with cases */

    , cntlinpds= T1/* input dataset with controls */

    , covlistn= EJFract Height /* list of continuous covariates */

    , covlistc= stent female diabetic acutemi /* list of categorical covariates */

    , idlistn= patid /* list of numerical ID variables */

    , idlistc=  /* list of character ID variables */

    , baseinpds= /* input dataset with base weights (optional) */

    , outds=t1w /* output dataset with controls and their calculated weights */

    , covnexc2= /* list of continuous covariates to be excluded from
              2nd moment balance (optional) */

    , solve=with nlp /* solver to be used - see proc optmodel */

    , minx=sum {i in indx} w[i]*log(w[i]/q[i]) /* objective function to minimize */

    , wbnd=1e-10 /* minimum weight allowed */

    , pres=aggressive /* type of preprocessing: see proc optmodel */

    , mom1=Y /* Y if 1st moment (i.e. mean) of covariates to be balanced */

    , mom2=Y /* Y if 2nd moment (i.e. variance) of covariates to be balanced */

    , logwchart=Y /* Y if log(w) chart to be produced */

    , debug=N

    , verbose=N);  run;

title1;

* Concatenate Control and Treated datasets with entropy weights and conduct weighted analysis of the outcome variable;

proc print data=t1w (obs=10);

  title ‘test listing of t1w’; run;

data EB;

  set t1w t0w;

  w_eb = w;

  keep patid w_eb;

proc sort data=EB; by patid; run;

proc sort data=ps1_rct; by patid; run;

data eb;

  merge eb ps1_rct;

  by patid; run;

proc print data=EB (obs=5);

   title ‘test listing of EB weights’; run;

proc means data=eb n mean std min p10 median p90 max;

   class thin;

   var w_eb ;

   title ‘Summary of EB weights by Treatment Group’; run;

proc genmod data=eb;  

    weight w_eb;  

    class thin patid;  

    model Cardcost = thin;  

    repeated subject=patid; * REPEATED added to get “sandwich” error estimation;  

    lsmeans thin / pdiff;  

    title ‘Entropy Weighted Analysis using GENMOD with sandwich error estimation’;    

run;

proc genmod data=eb;  

    weight w_eb;  

    class thin patid;  

    model Cardcost = thin height ejfract stent diabetic acutemi female;  

    repeated subject=patid; * REPEATED added to get “sandwich” error estimation;  

    lsmeans thin / pdiff;  

    title ‘Entropy Weighted Analysis using GENMOD with sandwich error estimation’;  

  

run;

* This repeats the above Entropy Re-weighting analysis except the weights are computed only once for the full RCT sample (treatment groups combined);

data t0t1;

  set ps1_rct;

  run;

    

title1 “entropy balancing to generate weights for the RCT population”;

%ebc( caseinpds= Target /* input dataset forming the target population */

    , cntlinpds= ps1_rct /* input dataset with controls */

    , covlistn= ejfract height /* list of continuous covariates */

    , covlistc= stent female diabetic acutemi /* list of categorical covariates */

    , idlistn= patid /* list of numerical ID variables */

    , idlistc=  /* list of character ID variables */

    , baseinpds= /* input dataset with base weights (optional) */

    , outds=ps1_rctw /* output dataset with controls and their calculated weights */

    , covnexc2= /* list of continuous covariates to be excluded from
              2nd moment balance (optional) */

    , solve=with nlp /* solver to be used - see proc optmodel */

    , minx=sum {i in indx} w[i]*log(w[i]/q[i]) /* objective function to minimize */

    , wbnd=1e-10 /* minimum weight allowed */

    , pres=aggressive /* type of preprocessing: see proc optmodel */

    , mom1=Y /* Y if 1st moment (i.e. mean) of covariates to be balanced */

    , mom2=Y /* Y if 2nd moment (i.e. variance) of covariates to be balanced */

    , logwchart=Y /* Y if log(w) chart to be produced */

    , debug=N

    , verbose=N);  run;

title1;

* Concatenate Control and Treated datasets with entropy weights *

* and conduct weighted analysis of the outcome variable.        *;

data ps1_rctw;

    set ps1_rctw;

    w_eb = w;

    keep patid w_eb;

  

proc sort data=ps1_rctw;  by patid; run;

proc sort data=ps1_rct;   by patid; run;

data ps1_rctw;

    merge ps1_rctw ps1_rct;

       w_ebsq = w_eb*w_eb;

    by patid; run;

    

proc means data=ps1_rctw n mean std min p10 median p90 max;

    class thin;

    var w_eb ;

    title ‘Summary of EB weights by Treatment Group’; run;

proc genmod data=ps1_rctw;  

    weight w_eb;  

    class thin patid;  

    model Cardcost = thin;  

    repeated subject=patid; * REPEATED added to get “sandwich” error estimation;  

    lsmeans thin / pdiff;  

    title ‘Weighted Analysis using GENMOD with sandwich error estimation’;    

run;

proc genmod data=ps1_rctw;  

    weight w_eb;  

    class thin patid;  

    model Cardcost = thin height ejfract stent diabetic acutemi female;  

    repeated subject=patid; * REPEATED added to get “sandwich” error estimation;  

    lsmeans thin / pdiff;  

    title ‘Weighted Analysis using GENMOD with sandwich error estimation’;    

run;

proc means data=ps1_rctw n mean sum std;

  var w_eb w_ebsq;

  output out=ESS sum = sum_web sum_websq;

  title ‘ESS calculations’; run;

data ess;

  set ess;

  effss = (sum_web**2) / sum_websq;

proc print data=ess;

  title ‘testing ESS’; run;

Technical Note: This code conducts the entropy weighting both separately for each treatment group and by combining treatment groups – though output only from the separate weighting is used here.

14.5 Analysis of Generalizability Using the PCI15K Data

14.5.1 RCT and Target Populations

As described in Chapter 3, the PCI15K dataset is a simulated dataset based on the Lindner study, which evaluated the addition of a blood thinner to usual care treatments. In Chapter 7, this data was used to compare the causal effects of the blood thinner on six-month CV related costs. For this experiment we split the PCI15K data into two samples: a hypothetical RCT and the remainder forming the target population. To create a difference between the RCT and target populations, we chose patients with diabetes and prior acute MI and selected them with higher probability for the RCT. A total of 5340 patients selected for the RCT were then randomized with 50% probability to the hypothetical treatments 0 and 1. This sample represents a hypothetical RCT comparing the addition of a blood thinner therapy (treated) to standard of care (control) and is referred to as the RCT sample. The remaining N = 10,147 patients were considered as a large heterogeneous observational sample representative of patients in actual practice, referred to as our target population.

The goal of the analysis was to assess the generalizability of the results from our hypothetical RCT regarding the CV costs between the treatment groups. To assess the generalizability, we will re-weight the RCT patient population to look like the population characteristics in the target population. Table 14.1 describes the patient characteristics in our hypothetical RCT by treatment group. Due to the randomization, the treatment groups were well-balanced on all the pre-treatment covariates. Table 14.2 presents the patient characteristics of the RCT (treatment groups combined, labeled as RCTflag = 1) sample next to the target population (RCTflag = 0). Due to the construction of these samples, the two populations were similar on all characteristics except for substantial differences in rates of diabetes and acute MI.

Table 14.1: Summary of Population Characteristics: Hypothetical RCT Data

Trt

0

1

N

Mean

Std

N

Mean

Std

stent

2650

0.65

0.48

2690

0.67

0.47

female

2650

0.34

0.47

2690

0.34

0.47

diabetic

2650

0.52

0.50

2690

0.53

0.50

acutemi

2650

0.26

0.44

2690

0.26

0.44

height

2650

171.98

10.14

2690

172.16

10.28

ejfract

2650

50.59

9.57

2690

50.63

9.58

ves1proc

2650

1.39

0.58

2690

1.37

0.59

Table 14.2: Summary of Population Characteristics: Target Population (RCTflag = 0) Versus RCT (RCTflag = 1).

RCTflag

0

1

N

Mean

Std

N

Mean

Std

Stent

10147

0.66

0.47

5340

0.66

0.47

Female

10147

0.33

0.47

5340

0.34

0.47

diabetic

10147

0.05

0.22

5340

0.52

0.50

acutemi

10147

0.03

0.16

5340

0.26

0.44

height

10147

172.01

9.74

5340

172.07

10.21

ejfract

10147

51.97

8.58

5340

50.61

9.57

ves1proc

10147

1.32

0.61

5340

1.38

0.58

Because the RCT and target population had different rates of subjects with diabetes and prior acute MI, if there is heterogeneity of treatment effect with these variables, then the results from the RCT might not be generalizable to the target population.

Prior to conducting the generalizability re-weighting, first we summarize the treatment effects from the simulated RCT. As the treatment groups were balanced at baseline due to the randomization, a simple t test (PROC TTEST) was used to assess causal treatment effects. Of course, a regression model – or a nonparametric approach given the skewness of the cost outcome – may be a better analytic approach. However, because the focus here is on demonstration of the re-weighting, we chose the simplest analysis approach. Table 14.3 displays the output from the PROC TTEST analysis. Analysis showed a non-statistically significant difference of $241 lower cost per patient in the treated group.

Table 14.3: T test for 6-Month CV Related Costs from the Hypothetical RCT

trt

Method

N

Mean

Std Dev

Std Err

Minimum

Maximum

0

2650

15145.9

8808.6

171.1

2452.0

73705.0

1

2690

14905.1

9185.5

177.1

4586.0

178728

Diff (1-2)

Pooled

240.8

9000.4

246.3

Diff (1-2)

Satterthwaite

240.8

246.3

Trt

Method

Mean

95% CL Mean

Std Dev

95% CL Std Dev

0

15145.9

14810.4

15481.5

8808.6

8577.7

9052.4

1

14905.1

14557.9

15252.4

9185.5

8946.4

9437.8

Diff (1-2)

Pooled

240.8

-242.1

723.7

9000.4

8832.9

9174.5

Diff (1-2)

Satterthwaite

240.8

-242.0

723.6

Method

Variances

DF

t Value

Pr > |t|

Pooled

Equal

5338

0.98

0.3284

Satterthwaite

Unequal

5334.1

0.98

0.3282

Equality of Variances

Method

Num DF

Den DF

F Value

Pr > F

Folded F

2689

2649

1.09

0.0305

14.5.2 Inverse Probability Generalizability

Given the narrower population characteristics in the RCT, this section assesses the generalizability of the results from Section 14.5.1 using inverse probability weighting. Program 14.1 provided the SAS code for implementation of the analysis. First, the PSMATCH procedure was used to estimate the probability of being in the RCT from the pooled population of both RCT and target population patients. Note that for this example, we separately weight the treated and control groups in the RCT to the target population. This is done by calling PROC PSMATCH twice in Program 14.1 to estimate the probability of being in the RCT. With the output data set from PROC PSMATCH, the calculation of the weights and re-scaling of the weights (to ensure the sum of weights equals the RCT sample size) is straightforward following the equations of Section 14.3.1 and Program 14.1. Table 14.4 summarizes the standardized weights based on the probabilities produced by the PSMATCH procedure. Note that only patients in the RCT are summarized in Table 14.4 because patients in the target sample were given a weight of 1. Also, the mean weight in is 1.0 for both treatment groups due to fact that we separately re-weighted each treatment group to the target population. When the combined groups are re-weighted, as in the next example, this will not necessarily be the case.

Table 14.4: IPW Generalizability Re-Scaled Weights: Overall and by Treatment Group

Analysis Variable: rwt

N

Mean

Std Dev

Minimum

10th Pctl

Median

90th Pctl

Maximum

5340

1.0000000

1.6017123

0.0025313

0.0698822

0.1298692

4.2097088

8.0021533

Analysis Variable: rwt

trt

N

Mean

Std Dev

Minimum

10th Pctl

Median

90th Pctl

Maximum

0

2650

1.000

1.582

0.003

0.074

0.131

4.220

8.002

1

2690

1.000

1.621

0.003

0.069

0.127

4.197

5.498

No trimming was done because the maximum weight was not extreme. However, Lin et al. (to appear) provides a detailed discussion of trimming options that are available. The effective sample size was approximately 1500, representing a substantial drop in power due to the re-weighting.

The feasibility and balancing programs of Chapter 5 assessed the balance between the weighted covariates from the RCT and the actual target population values (compared to prior to weighting) as well as between the two treatments within the RCT population. Figures 14.1 and 14.2 provide a summary of the balance. From Figure 14.1 we see that the re-weighting of the RCT does produce a population similar to the target population because standardized differences for all covariates between the weighted RCT and the target population were small. Figure 14.2 displays the impact of the weighting on the balance between the randomized treatments in the RCT patients. Note that the differences between the two randomized groups can be increased by the weighting. In this case, the differences are still small (<0.1). Regardless, we will use regression adjustment when analyzing the outcome to account for the residual imbalances.

Figure 14.1: Balance Check After IPW Re-weighting: Standardized Differences for RCT Versus Target Population

Figure 14.2: Balance Check After IPW Re-weighting: Standardized Differences for Treatment Groups Within the RCT Population

A comparison of weighted outcomes then provides an estimate of the effect of the treatment had the population in the RCT looked like the population in the target population. Table 14.5 presents the weighted regression analysis (only treatment in the model) using the GENMOD procedure with a WEIGHT statement and the sandwich variance estimator to account for the weighting. After re-weighting, there is an estimated mean reduction in costs of $947 for patients in the treated group (p=.085). Table 14.6 repeats this analysis but further adjusts for baseline covariates in the regression model. The estimated reduction in cost was similar and the inferences the same.

Table 14.5: Inverse Probability Weighted Generalizability Analysis (Estimated Treatment Difference in the Target Population): Simple Weighted Model

Model Information

Data Set

WORK.PS_RCT

Distribution

Normal

Link Function

Identity

Dependent Variable

cardcost

Scale Weight Variable

rwt

Number of Observations Read

5340

Number of Observations Used

5340

Sum of Weights

5340

Analysis Of GEE Parameter Estimates

Empirical Standard Error Estimates

Parameter

Estimate

Standard
Error

95% Confidence Limits

Z

Pr > |Z|

Intercept

15190.52

429.4775

14348.76

16032.28

35.37

<.0001

trt

0

947.1457

550.6323

-132.074

2026.365

1.72

0.0854

trt

1

0.0000

0.0000

0.0000

0.0000

.

.

trt Least Squares Means

trt

Estimate

Standard Error

z Value

Pr > |z|

0

16138

344.59

46.83

<.0001

1

15191

429.48

35.37

<.0001

Differences of trt Least Squares Means

trt

_trt

Estimate

Standard Error

z Value

Pr > |z|

0

1

947.15

550.63

1.72

0.0854

Table 14.6: Inverse Probability Weighted Generalizability Analysis (Estimated Treatment Difference in the Target Population): Weighted Regression Model

Model Information

Data Set

WORK.PS_RCT

Distribution

Normal

Link Function

Identity

Dependent Variable

cardcost

Scale Weight Variable

rwt

Number of Observations Read

5340

Number of Observations Used

5340

Sum of Weights

5340

Analysis Of GEE Parameter Estimates

Empirical Standard Error Estimates

Parameter

Estimate

Standard
Error

95% Confidence Limits

Z

Pr > |Z|

Intercept

33061.39

5983.424

21334.10

44788.69

5.53

<.0001

trt

0

892.0205

551.5807

-189.058

1973.099

1.62

0.1058

trt

1

0.0000

0.0000

0.0000

0.0000

.

.

height

-67.2148

32.3714

-130.662

-3.7680

-2.08

0.0379

ejfract

-110.026

34.9206

-178.469

-41.5829

-3.15

0.0016

stent

-641.854

700.1620

-2014.15

730.4384

-0.92

0.3593

diabetic

-957.047

383.0181

-1707.75

-206.346

-2.50

0.0125

acutemi

-3426.87

382.3708

-4176.30

-2677.43

-8.96

<.0001

female

85.4703

779.3683

-1442.06

1613.004

0.11

0.9127

trt Least Squares Means

trt

Estimate

Standard Error

z Value

Pr > |z|

0

15033

294.40

51.06

<.0001

1

14141

308.00

45.91

<.0001

Differences of trt Least Squares Means

trt

_trt

Estimate

Standard Error

z Value

Pr > |z|

0

1

892.02

551.58

1.62

0.1058

14.5.3 Entropy Balancing Generalizability

This section provides an assessment of the generalizability of the results from 14.5.1 using a different weighting approach, entropy balancing. As discussed in Section 14.3.2, entropy balancing can generate weights that produce exact matches in patient characteristics to the target population. Rather than estimating a propensity model for the weights as in Section 14.5.2, this approach is simply an algorithm that searches and finds the optimal set of weights (if such a set exists). In addition, there is no need to have patient-level data for the target population because summary statistics are sufficient. SAS code for the entropy balancing algorithm was provided as Program 8.3. In Program 14.2, we called the entropy balancing algorithm for the purpose of assessing generalizability.

Note that Program 14.2 calls the entropy balancing algorithm for the RCT population (treatment groups pooled). The weights will by definition sum to the total sample size and thus no re-scaling of the weights is necessary. Table 14.7 summarizes the weighted values from each covariate for the RCT (columns with “co”) relative to the target population values (columns with “ca”). This demonstrates that the algorithm found a solution that produces exact matches on both means and variances. Note that the table shown is for the control group, but the table from the treated group is exactly the same. Table 14.8 summarizes the entropy balancing weights in each treatment group. The mean of the weights in each group is 1 by design of the algorithm. The maximum weights are under 5. Thus, no trimming was considered.

Table 14.7: Summary of Balance Produced by Entropy Balancing Weights

VarName

mean_ca

mean_co

variance_ca

variance_co

ejfract

51.96965

51.96965

73.61469

73.61469

height

172.0142

172.0142

94.77843

94.77843

stent0

0.341382

0.341382

0.224862

0.224925

female0

0.674879

0.674879

0.219439

0.2195

diabetic0

0.94964

0.94964

0.047828

0.047842

acutemi0

0.974278

0.974278

0.025063

0.02507

Table 14.8. Summary of Entropy Balancing Weights by Treatment Group

Analysis Variable: w_eb

thin

N Obs

N

Mean

Std Dev

Minimum

10th Pctl

Median

90th Pctl

Maximum

0

2650

2650

1.0000000

1.4836075

0.0031758

0.0979129

0.1122142

3.4835083

4.3640433

1

2690

2690

1.0000000

1.5348928

0.0029998

0.0911182

0.1139450

3.6473073

4.8850281

Though exact balance was produced between the pooled RCT population and the target population, one should confirm that the re-weighting did not create major imbalances for covariates between the two treatment groups within the RCT. Figure 14.3 presents this balance check. While the balance is indeed worsened (as expected), note from the scale of the X axis that all of the graphic is within the 0.1 range of reasonable balance.

Figure 14.3: Balance Check After Entropy Balancing: Standardized Differences for Treatment Groups Within the RCT Population

Using the computed entropy weights for each patient in the RCT, we then conducted a weighted analysis using the GENMOD procedure (same analysis code as for the inverse weighted analysis) to assess treatment differences. Table 14.9 provides the results. Note that including adjustment for the pre-treatment characteristics in the model again made little difference and results are not shown here for brevity.

The results from this entropy weighted analysis were very similar to the IPW approach (estimated treatment effects of $798 (p=.1292) from entropy balancing and $947 (p=.0854) from IPW) and had the same inferences. Note, however, both analyses have large confidence intervals given the nature of high variability in cost data and the additional variability from the weighted analysis. Compared to the hypothetical RCT results from Table 14.3, the estimated treatment differences were slightly increased, though all analyses failed to show a statistically significant difference in costs between the treatment groups.

Table 14.9: Entropy Balancing Weighted Generalizability Analysis (Estimated Treatment Difference in the Target Population): Simple Weighted Model

Model Information

Data Set

WORK.EB

Distribution

Normal

Link Function

Identity

Dependent Variable

cardcost

Scale Weight Variable

w_eb

Number of Observations Read

5340

Number of Observations Used

5340

Sum of Weights

5340

Analysis Of GEE Parameter Estimates

Empirical Standard Error Estimates

Parameter

Estimate

Standard
Error

95% Confidence Limits

Z

Pr > |Z|

Intercept

15645.46

403.0507

14855.49

16435.42

38.82

<.0001

thin

0

797.5467

525.6211

-232.652

1827.745

1.52

0.1292

thin

1

0.0000

0.0000

0.0000

0.0000

.

.

thin Least Squares Means

thin

Estimate

Standard Error

z Value

Pr > |z|

0

16443

337.38

48.74

<.0001

1

15645

403.05

38.82

<.0001

Differences of thin Least Squares Means

thin

_thin

Estimate

Standard Error

z Value

Pr > |z|

0

1

797.55

525.62

1.52

0.1292

14.6 Summary

In this chapter, we presented the use of inverse propensity weighting and entropy balancing as tools to quantitatively assess the generalizability of RCT results. These tools allow for the specification of a target population, typically based on real world data sources due to their broad generalizability. Such methods are particularly of value at the time of the launch of a new medication where RCT data exists and target populations are clear, but no observational data with the compound is yet available. Of course, these methods rely on core assumptions discussed in this chapter and only address differences in the populations between the RCT and target population. Unmeasured confounding or issues such as differences in behaviors from patients in usual care verses during participation in a trial are not accounted for.

Two generalizability approaches were demonstrated in this chapter: inverse probability weighting and entropy balancing. Entropy balancing is a particularly promising approach given it produces exact balance and it can be applied whether one does or does not have individual patient data for the target population. We split the PCI15K data set into a hypothetical RCT and a large heterogeneous real world target population and applied both inverse propensity weighting and entropy balancing to evaluate the generalizability of the RCT. Results from each method were very similar and both cases demonstrated how these methods could identify important generalizability issues when there is heterogeneity of outcomes and narrow RCT populations. SAS code was provided to allow easy implementation of these weighting generalizability methods.

References

Austin PC, Stuart EA (2015). Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Statistics in Medicine 34, 3661–3679.

Chen Z and Kaizar E (2018). On variance estimation for generalizing from a trial to a target population. arXiv 1704.07789.

Cole SR, Hernán MA (2008). Constructing inverse probability weights for marginal structural models. American Journal of Epidemiology 68(6):656-64.

Cole SR, Stuart EA (2010). Generalizing evidence from randomized clinical trials to target populations: the ACTG 320 trial. Am J Epidemiol. 172:107-115.

Cooper H, Hedges L V. (2009). Research synthesis as a scientific process. The Hand. of Res. Synthesis and Meta-Analysis, 2nd Ed. Russel Sage Foundation.

Dahabreh IJ and Hernán MA (2019). Extending inferences from a randomized trial to a target population. European Journal of Epidemiology 34:719–722.

Didden EM, Ruffieux Y, Hummel N, Efthimiou O, Reichenbach S, Gsteiger S, Finckh A, Fletcher C, Salanti G, Egger M, on behalf of IMI GetReal Work Package 4 (2018). Prediction of Real-World Drug Effectiveness Prelaunch: Case Study in Rheumatoid Arthritis. Medical Decision Making 38(6):719–729.

Droitcour J, Silberman G, Chelimsky E (1993). Cross-design synthesis: A New Form of Meta-analysis for Combining Results from Randomized Clinical Trials and Medical-Practice Databases. International Journal of Technology Assessment in Health Care 9(3):440-9.

Framework for FDAs Real World Evidence Program. U.S. Food and Drug Administration. https://www.fda.gov/media/120060/download.

Green LW, Glasgow RE (2006). Evaluating the relevance, generalization, and applicability of research: Issues in external validation and translation methodology. Evaluation and the Health Professions 29, 126–153.

Kaizar EE (2011). Estimating treatment effect via simple cross design synthesis. Statistics in Medicine 30, 2986–3009.

Kennedy-Martin T, Curtis S, Faries D, Robinson S, Johnston J (2015). A literature review on the representativeness of randomized controlled trial samples and implications for the external validity of trial results. Trials 16, 1–14.

Lesko CR, Buchanan AL, Westreich D, Edwards JK, Hudgens MG, Cole SR (2017). Generalizing Study Results: A Potential Outcomes Perspective. Epidemiology 28, 553–561.

Lin CY, Kaizar E, Faries D, Johnston J (under review). A Comparison of Generalization Reweighting Estimators of Average Treatment Effects in Real-World Populations.

Mumford SL and Schisterman EF (2019). New methods for generalizability and transportability: the new norm. European Journal of Epidemiology 34:723–724.

Murad MH, Katabi A, Benkhadra R, Montori VM (2018). External validity, generalisability, applicability and directness: a brief primer. BMJ evidence-based medicine 23(1):17-19.

Nguyen TQ, Id BA, Schmid I, Cole SR, Stuart EA (2018). Sensitivity analyses for effect modifiers not observed in the target population when generalizing treatment effects from a randomized controlled trial: Assumptions, models, effect scales, data scenarios, and implementation details. PLoS ONE 13(12): 1-18.

Pressler TA, Kaizar EE (2013). The use of Propensity Scores and Observational Data to Estimate Randomized Controlled Trial Generalizability Bias. Statistics in Medicine 32:3553-3568.

Signorovitch JE, Sikirica V, Erder MH, Xie J, Lu M, Hodgkins PS, Betts KA, Wu EQ (2012). Matching-adjusted Indirect Comparisons: a New Tool for Timely Comparative Effectiveness Research. Value in Health 15(6):940-7.

Stuart EA, Bradshaw CP, Leaf PJ (2015). Assessing the Generalizability of Randomized Trial Results to Target Populations. Prev Sci. 16(3):475-85.

Westreich D, Edwards JK, Lesko CR, Stuart E, Cole SR (2017). Practice of Epidemiology Transportability of Trial Results Using Inverse Odds of Sampling Weights. Am J Epidemiol 186(8):1010-1014.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset