Making the (Regulatory) Grade With Real-World Data
By Deborah Borfitz
March 12, 2020 | How to assess the quality of real-world data (RWD) for research purposes and apply it in real-world scenarios to add efficiencies to clinical trials and support claims about drug product effectiveness were top topics of interest at the recent Summit for Clinical Ops Executives (SCOPE) in Orlando. Many providers in the fast-growing RWD landscape are focused on a specific therapeutic area, including Flatiron Health. In addition to offering an oncology-specific electronic health record (EHR), the company has partnered with the U.S. Food and Drug Administration (FDA) to find ways to integrate EHR-derived evidence into regulatory decision-making.
Data quality was the focus of a presentation by Emily Castellanos, M.D., associate medical director for research oncology at Flatiron Health. “The strength of real-world evidence [RWE] has to do with the fitness of the underlying real-world data,” she says, and “fitness” can be gauged by the data’s relevance and quality.
Relevant data are robust, representative of the population of interest and determined by the question at hand, Castellanos says. EHRs would be the relevant data sources for questions about treatment outcomes, while personal digital health apps and patient-generated data would be appropriate for understanding patient-reported outcomes and administrative claims data would be suitable for pharmacoepidemiology studies. EHR and claims data would both be relevant to questions about cost-effectiveness.
Data that are relevant must also include all the covariants necessary for analysis, she continues. “Linked data may be needed, so all data fields for the linkages have to be available.” Additionally, the data must be representative of the population of interest, there needs to be enough of it to adequately power the study design after the inclusion/exclusion criteria has been applied, and ample time must have passed to see the outcome of interest.
Data Quality Attributes
Data that passes the relevancy test must next be assessed for quality to ensure the information can answer the question of interest “accurately, reliably and repeatedly,” says Castellanos. Accuracy is closely tied to how logically believable and consistent the data are and how well it conforms to pre-specified standards.
Assessing RWD accuracy involves biological (age, pathology, lab values), temporal (logical sequencing of events, dates are probable) and clinical plausibility (alignment with external standards, downstream correlation with outcomes) testing, Castellanos says. Treatment start dates in close proximity to advanced diagnostics would establish patients’ first-line therapy, for example, and imaging showing disease progression would logically be followed by a change in treatment—or discontinuation of therapy and referral to hospice.
Consistency in abstracted data may be assessed by measuring agreement between different abstractors, which tends not to reach 100%, she says. Alternatively, it can be evaluated by seeing how well the same abstractors agree with themselves over time. Pathology and radiology reports and clinical notes may need to be put in a structured format using natural language processing, she adds.
Conformance is “highly related to interoperability of structured data,” Castallanos says. “Unstructured data must map to a structured data field for each patient and element.”
Mortality is a critical endpoint in overall survival analysis, she says. Flatiron Health has developed a composite endpoint that amalgamates four data sources—date-of-death certificates, unstructured documents, commercial death dataset with a next-generation lining algorithm and Social Security Death Index—assessed against the National Death Index as a gold standard. Combining the multiple data sources demonstrably improves accuracy.
Completeness is also important, she adds, since missing data introduces potential bias. For a variety of reasons, RWD may have lower completeness than prospectively collected data in a randomized controlled trial (RCT)—e.g., data may be present but not available, data is sourced from multiple providers and sites of care, and clinical documentation varies physician to physician. One reason for incompleteness of data is that cancer patients don’t always have their ECOG performance status (a trial inclusion/exclusion criterion) documented, she notes.
Conclusions based on RWD may not be meaningful if critical variables don’t meet a pre-defined quality metric requirement, Castallanos says. The criticality of a variable is dependent on how much “missingness” can be tolerated, she adds. Provenance still must be maintained, and the data mapped against source documentation to show its basis in truth.
Evaluating RWD Vendors
Two years ago, AstraZeneca embarked on a project to leverage RWD to support clinical trial research, according to Xia Wang, Ph.D., director of health informatics and global medicines development. Her team has been using artificial intelligence (AI) and analytics to provide on-demand insights to clinical teams in the areas of disease and treatment pathway, protocol design, study feasibility, trial interpretation, trial management, and patient recruitment.
RWD is directly available via licensing agreements with a broad range of big-data companies, as well as collaborations with TriNetX, InSite and Optum Clinformatics for Clinical Trials (OCCT) where data gets accessed remotely, says Wang. AstraZeneca is also assessing additional RWD vendors to “fill in the gaps… for example, patient phenotype [data] to predict the onset of disease.”
Project objectives included planning to identify strengths and limitations with current data vendors and additional vendors with potential new capabilities, says Wang. AstraZeneca was also looking to build a capabilities assessment framework and conduct vendor evaluations and provide recommendations. Lastly, it wanted to explore pilots that could “deliver impact” and identify study teams for engagement.
AstraZeneca’s RWD vendor evaluation framework has seven parts:
Coverage/Quantity – patient coverage, sample size, representativeness
Granularity/Depth – patient-level data, diagnosis, procedures, lab tests, quality of life, observations, outcomes
Accessibility – data access and usage limitations, raw data sharing, data privacy and regulatory compliance issues
Quality/Reputation – richness of the data, origin of the data, publications
Timeliness – data refresh frequency, historical coverage
Cost – cost of subscription
Clinical trial – disease understanding, patient feasibility, site identification, principal investigator identification, patient recruitment, patient re-identification
Data science and AI – synthetic control arm and/or events contextualization, pragmatic clinical trial design and delivery, develop applicable AI/ML algorithm to identify biomarkers associated with disease/condition and predict disease onset/trajectory
Wang shared AstraZeneca’s clinical trial feasibility vendor summary comprehensively comparing nine (de-identified) RWD vendors, in chart format, on each of 20 factors weighted heavily toward patients and sites. The company’s data science and AI vendor summary examines six, U.S.-based RWD vendors (all but one offering EHR data) on 19 assessment criteria. No clear patterns have emerged, she notes.
The decision tree for recommending an RWD vendor, for oncology as well as other types of clinical trials, begins with patient feasibility and site identification support, says Wang. If this isn’t available, the next step is to inquire about patent recruitment support.
The vendor options presented to clinical teams include detailed information on data and geographic coverage in the EHR, Wang says. The summary report has highlighted sections for strengths, access and cost, considerations, and potential pilot.
“Data completeness, depth, and quality remain key hurdles,” she concludes. Answers lay beyond data in cross-functional teamwork and mutually beneficial partnerships.
The Duke-Margolis Center for Health Policy at Duke University has been a key partner of the FDA when it comes to shaping guidance, policies, and procedures around the use of RWD and RWE in evaluating product effectiveness. “Evaluating and communicating RWD’s fitness-for-use is challenging… [and] a minimum set of verification checks could be the first step,” Managing Associate Cristina Silcox, Ph.D., was quick to point out during her presentation at SCOPE.
Additional fitness-for-use checks are still necessary, she adds. “Fitness-for-use RWD is but one component that is required for RWE to be fit-for-purpose for regulatory decision-making.”
Among the potential roles of RWD, Silcox says, are to improve clinical trial efficiency by generating hypotheses, identifying drug development tools, assessing trial feasibility, informing prior probability distributions in Bayesian statistical models, identifying prognostic indicators or patient baseline characteristics for enrichment or stratification and assembling geographically distributed research cohorts.
“RWE has its own value,” she continues, in terms of continually capturing the evolving standard of care, better reflecting routine clinical and self-care as well as outcomes that are relevant to patients and physicians. RWE also includes broader inclusion criteria than traditional RCTs, and because it is generated more efficiently may require fewer resources—which is not to say it’s an easy undertaking.
The preliminary framework for FDA’s Real-World Evidence Program, published in 2018, “supports changes to labeling about drug product effectiveness” and one of the three big areas covered is whether RWD are fit for use, Silcox says.
As discussed in an October 2018 white paper published by Duke-Margolis, “RWD curation is complex and hard to explain,” says Silcox. “It depends on the research question, and RWD is not generated for research so it’s not necessarily well suited to the question at hand.” Plus, “substantial heterogeneity” may exist within and between RWD sources.
A second white paper published last fall describes a framework for evaluating RWD fitness for use via verification checks assessing its reliability. The rationale for the checks is quality control, quality assurance and documentation, plus they represent a standard way to communicate that data are fit for use, she says.
The FDA’s fit-for-use framework has a category for data reliability—quality control/assurance and the process for how data gets collected—and another for data relevancy, which is “research question-specific,” Silcox says. “So, we focused on quality control/assurance.”
Empirical research published in 2017 by Kahn et al. harmonized quality principles for EHRs across verification (local knowledge) and validation (gold standard) contexts, she notes. The paper examined data quality checks used by six data-sharing networks in the U.S., and most of them related to verification, making that the logical starting point for Duke-Margolis.
Kahn et al. put data quality principles into one of three buckets for conformance, completeness and plausibility, explains Silcox. Examples of verification checks include:
Are >95% birthdate variables in MM-DD-YYYY format?
Does >85% of sex variables contain a single ASCII character?
What % of percent of encounter ID variables are missing data?
What percent of patients from a single institution have multiple record numbers?
The key next step is identifying the minimum set of verification checks, says Silcox. The checks and curation techniques should be pre-specified and justified in the study protocol. Acceptable thresholds will depend on the research question as well as the clinical and regulatory context. “Data heterogeneity isn’t bad, but it has to be explainable,” she stresses. “Results should always be contextualized in the data.”
Other next steps, continues Silcox, are developing a format for communicating check results and identifying data curation best practices by data source and fitness-for-use validation and relevancy checks.
Silcox referenced four pilot projects informing the assessment of fitness for use in regulatory decisions.
The FDA has teamed up with Harvard Pilgrim Health Care Institute and Harvard Medical School to develop uniform metadata standards for assessing and describing the quality, completeness, and stability of EHR data across data sources. Multiple federal agencies are also developing a meta-common data model to support harmonization of various models to allow researchers to better access RWD for patient-centered outcomes research.
The FDA, together with the University of California San Francisco and the Stanford Centers of Excellence in Regulatory Science and Innovation are developing a single-point data capture approach from the EHR to an electronic data capture system as part of an FDA-regulated clinical trial using open, consensus-based standards. Meanwhile, the Clinical Data Interchange Standards Consortium is endeavoring to identify a minimum set of open source standards to support data quality as part of a large-scale national learning health system.
Priority areas in 2020 for the Duke-Margolis RWE Collaborative are an RWE endpoints roadmap, external comparators, identifying shared real-world evidentiary opportunities, advancing international RWE efforts and patient-generated health data pilot implementation case studies. Its next RWE public meeting will be held on Oct. 20.