Emulating the GRADE trial using real world data: retrospective comparative effectiveness study

Emulating the GRADE trial using real world data: retrospective comparative effectiveness study

Setting OptumLabs® Data Warehouse (OLDW), a nationwide claims database in the US, 25 January 2010 to 30 June 2019.

Conclusions In this emulation of the GRADE trial, liraglutide was statistically significantly more effective at maintaining glycemic control than glimepiride or sitagliptin when added to metformin monotherapy. Generating timely evidence on medical treatments using real world data as a complement to prospective trials is of value.

Patient individual level age, sex, race or ethnicity, and annual household income were identified from OLDW enrollment files at the time of the index date. Detailed description of the source data for these variables is available in the supplemental methods. Comorbidities (ascertained from all claims during six months preceding the index date) included retinopathy, nephropathy, neuropathy, coronary artery disease, cerebrovascular disease, peripheral vascular disease, heart failure, and previous severe hypoglycemia and hyperglycemia, as detailed in supplemental table S2. Specialties of treating physicians were categorized as primary care, endocrinology, cardiology, nephrology, other, and unknown. Baseline drugs, included as surrogates for burden of complications, were identified from fills in the six months preceding the index date (see supplemental table S3).

Baseline characteristics in weighted cohort. Values are numbers (percentages) unless stated otherwise

All primary analyses were conducted using the per protocol censoring approach for the primary outcome and for the secondary outcomes of secondary metabolic failure and insulin initiation, censoring at the time of treatment drug discontinuation, disenrollment from the health plan, end of study period, or death, whichever came first (see supplemental figure S3). Time receiving treatment for each drug was determined by calculating continuous coverage episodes based on available fills—the same as for baseline metformin treatment. Remaining secondary outcomes were analyzed using the intention-to-treat censoring approach, censoring the participant at the time of health plan disenrollment, end of study, or death, which ever came first. P<0.05 was considered statistically significant for all two sided tests. All analyses were performed using SAS 9.4 (SAS Institute, Cary, NC) and R version 4.0.2.(R Foundation).

First, to examine the comparative effectiveness of study drugs while treated only with them and not with any other drug, accounting for real world treatment practices, we repeated all analyses using the as treated censoring approach, censoring at the time a new drug class was added, the assigned drug was discontinued, health plan disenrollment, end of study, or death, which ever came first (see supplemental figure S3). Second, we assessed residual confounding by testing a falsification endpoint that was unlikely to be associated with the studied drugs: diagnosis of pneumonia (see supplemental table S2) during the follow-up period.

Patients were not involved in the design, conduct, or dissemination of this study. However, this study was informed by the need to identify preferred glucose lowering treatment strategies in the absence of direct comparisons across the examined drugs; and to examine whether and how data collected in the process of routine patient care can be used to emulate prospective clinical trials. Because this study seeks to inform drug regulatory policy and procedures, investigators from the FDA contributed to the design of the study and interpretation of study findings; they are included as coauthors on this publication.

Supplemental table S9 shows baseline characteristics of the included individuals before weighting. Across the four treatment groups, there were significant differences (largest standardized mean difference >0.2) in age, race or ethnicity, annual household income, and prescribing physician specialty. Individuals in the liraglutide arm were more likely to be younger, white, on a higher income, and treated by an endocrinologist than those in the other treatment arms. Individuals in the glargine arm were most likely to be on a low income and they had the highest prevalence of all examined comorbidities.

The glargine arm was excluded from all analyses because of small sample size (n=251, weighted n=179) and inability to achieve good control of confounders after weighting. The propensity score model was estimated on the remaining three groups. After weighting, mean participant ages were 62.0 years (standard deviation (SD) 11.1 years) in the glimepiride arm, 62.0 (SD 11.0) years in the sitagliptin arm, and 60.5 (SD 10.4) years in the liraglutide arm (). Women comprised 48.2% (2009 of 4168) of the glimepiride arm, 49.1% (1374 of 2800) of the sitagliptin arm, and 50.5% (289 or 572) of the liraglutide arms. White people comprised 64.7% (2695 of 4168), 64.2% (1798 of 2800), and 65.8% (376 of 572) of the treatment arms, respectively. Mean baseline HbA1c levels were 7.63% (SD 0.48%), 7.61% (SD 0.47%), and 7.60% (SD 0.48%), respectively. Supplemental table S10 presents the pairwise standardized mean differences for all baseline covariates; all values were <0.2.

Cumulative incidence rates of primary metabolic failure in propensity score weighted individuals included in the study

Time to secondary metabolic failure was longest in the liraglutide arm (see supplemental figure S5). Liraglutide was associated with lower risk of secondary metabolic failure compared with glimepiride (0.61, 0.43 to 0.87) and sitagliptin (0.59, 0.41 to 0.85); . By one year, the estimated cumulative incidence rates of secondary metabolic failure were 0.11 (95% confidence interval 0.06 to 0.17) in the liraglutide arm, 0.20 (0.19 to 0.22) in the glimepiride arm, and 0.22 (0.19 to 0.24) in the sitagliptin arm (). The difference in event rates persisted at two years.

Insulin was started by 84 of 4168 (2.0%) people in the glimepiride arm, 11 of 572 (1.9%) in the liraglutide arm, and 50 of 2800 (1.8%) in the sitagliptin arm, with no significant difference among the three groups (hazard ratios for pairwise comparisons are shown in ). Overall, 37 patients experienced emergency department visits or hospital admissions for hypoglycemia during the study period, including <11 in the liraglutide and sitagliptin arms, precluding statistical analyses.

Heart failure, end stage kidney disease, pancreatitis, pancreatic cancer, thyroid cancer, and all cause mortality could not be analyzed owing to too few (<11) events in all treatment groups (supplemental table S11 presents the event rates). No statistically significant differences were observed between groups for major adverse cardiovascular events, retinopathy, neuropathy, other cardiovascular events, cancer, and all cause admissions to hospital (see supplemental table S12).

Another glucose lowering drug was added before discontinuation of the assigned treatment in 423 of 4168 (10%) people in the glimepiride arm, 237 of 572 (41%) in the liraglutide arm, and 419 of 2800 (15%) in the sitagliptin arm. Sensitivity analyses using the as treated censor approach were consistent with the primary analyses (see supplemental figure S6 and table S15). No significant differences were observed among the treatment groups for the pneumonia falsification endpoint (see supplemental table S16).

Our study is strengthened by application of advanced analytic methods that account for measured differences between treatment arms that otherwise confound analyses and preclude causal inference. The generalized boosted based models for the propensity score are more flexible and less sensitive to model misspecification compared with logistic regression. The large and diverse population within OLDW made emulation efforts uniquely possible despite the narrow eligibility criteria specified by the GRADE trial.

Better understanding of the comparative effectiveness and safety of second line glucose lowering drugs is urgently needed to inform shared decision making in diabetes. Ultimately, the population included in this study and our findings should be compared with those of the GRADE trial, once published in peer reviewed literature, to assess the fidelity and generalizability of results and to improve our understanding of the use of real world data to emulate clinical trials.

All study data are deidentified consistent with Health Insurance Portability and Accountability Act of 1996 (HIPAA) expert deidentification determination. The study was therefore exempt from review by the Mayo Clinic institutional review board.

This study was conducted using deidentified claims data from OptumLabs Data Warehouse. Raw data are not publicly available. The study protocol, code sets, and statistical analysis plan are available online.40

This is an Open Access article distributed in accordance with the terms of the Creative Commons Attribution (CC BY 4.0) license, which permits others to distribute, remix, adapt and build upon this work, for commercial use, provided the original work is properly cited. See: http://creativecommons.org/licenses/by/4.0/.