The Intersection of Real-World Data and Machine Learning for Clinical Research

By Allison Proffitt 

June 15, 2021 | Personalized medicine—as we use the term today—offers the promise of tailor-made treatment strategies for individual patients, but really personalized medicine is nothing new, said Jose-Felipe Golib-Dzib, project lead at Janssen R&D, to the virtual audience at last week’s DECODE: AI for Pharmaceuticals forum

“The practice of medicine has always been about treating each patient individually,” Golib-Dzib said, “and clinicians have long observed that different patients respond differently to medical intervention.” 

Patients will benefit faster if we consolidate and enrich patient-level data, Golib-Dzib argued, augmenting clinical data with medical records, patient-reported data, claims data, medical imaging, genetic profiling, and more. The combined real-world data becomes regulatory grade when it is compiled and applied to a specific research question. 

“In addition to data harmonization, specialized tools are needed to implement validated methodologies to produce evidence in a timely, transparent, and reliable way,” he added.

Once the data are gathered and harmonized, platforms designed with privacy and local regulations can enable learning across hospital systems. For example, Golib-Dzib presented a federated data model. Data custodians at different hospitals or research centers maintain their data in private databases, and the data flow into local data models that are connected with a global data model on a federated server. 

“A federated learning in healthcare requires a bidirectional communication system to allow the emergence of an integrative global model that receives contributions iteratively from all data custodians. Therefore, each data custodian receives evidence based not only on their own data, but data from others that would remain unavailable if working in isolation.” 

These consolidated data will certainly be more powerful than if each institution was limited to their own data. The model will allow us to more quickly progress from collection to analysis to prediction and finally to prescription. 

“In the future, we can use the scientific insights obtained from the application of machine learning and artificial intelligence, for instance, to prevent  and intercept diseases in novel ways,” Golib-Dzib said. “A future where we experience such a transformational coming-of-age having a positive effect in relatives, in our communities, and in all patients globally.”

Machine Learning Use Cases in Pharma 

With that vision cast, Christos Chatzichristos, a postdoctoral researcher at Janssen, outlined some of Janssen’s use cases for machine learning and the outcomes expected from broader use for ML.

Machine learning models can create novel insight into disease pathways, benchmark care pathways and their impact on outcomes, and predict future disease states, stratifying patients based on current clinical state and available history, Chatzichristos explained. He expects broader use of pragmatic trials, which will be designed with machine learning tools to study the real-world effectiveness of an intervention in a broader patient group, and ML-designed study control arms. 

“But how can we provide enough data to the machine learning models to give us accurate predictions?” Chatzichristos asked. The answer is augmented real-world data comprising EHR data, clinical data, imaging data, claims data, patient-reported outcomes, ‘omics data, and more. RWD can enhance the understanding of a patient’s disease trajectory, he said.

Unfortunately, our RWD are heavily underutilized, he said. RWD is heterogenous, unstructured, and of inconsistent quality. It is siloed in medical archive systems. And significant issues remain around data security, ownership, privacy and patient consent. 

If we could fully use RWD, Chatzichristos predicts that our current problem would be reversed.

“The main difficulty in biomedical machine learning applications is the lack of data,” he said. “With the use of real-world data obtained from everyday clinical routines, the problem can be reversed. We have a lot of data, but we still lack the knowledge of appropriate machine learning tools with which we will explore our big data.”  

Chatzichristos does not doubt that the future is near though; it is only a matter of time and collaboration.

“Real-world data and machine learning models can facilitate personalized medicine,” he said. “We need the cooperation of clinicians, doctors, engineers, biomedical companies, universities, and of course the patients themselves in order to fully exploit the power of personalized medicine.”