AI Model Estimates Treatment Effect on Par With Clinical Trials

By Deborah Borfitz

June 6, 2024 | Researchers at The Ohio State University (OSU) have been using artificial intelligence (AI) for several years now to emulate clinical trials by crunching real-world data (RWD). In their latest computational feat, they succeeded in precisely estimating results of randomized clinical trials comparing the effectiveness of two treatments for reducing the risk of stroke after coronary artery disease, reports computer scientist Ping Zhang, who leads the Artificial Intelligence in Medicine Lab at OSU. 

In this way, the deep learning framework—termed CURE (causal treatment effect estimation)—quantitatively increased performance by 7% to 8% over other methods that could only infer similar results, as reported recently in Patterns (DOI: 10.1016/j.patter.2024.100973). In AI predictive modeling, just a 2% to 3% improvement can be noteworthy, says Ruoqi Liu, a computer science and engineering Ph.D. student in Zhang’s lab whose model-constructing techniques added to CURE’s power. 

The two pretrained models that served as comparators were BEHRT, a deep neural sequence transduction model for electronic health records (EHRs), and Med-BERT, bidirectional encoder representations from transformers model adapted to the structured EHR domain. Since they utilize the standard BERT encoding method with only minimal modifications, they may not adequately capture the intricate hierarchical relationships and temporal irregularities inherent in patient data, says Liu, noting that CURE delivers good predictions with the proposed patient encoding method. 

Performance of CURE and a previously described framework known as KG-TREAT (AAAI Conference on Artificial Intelligence, DOI: 10.1609/aaai.v38i8.28727), were optimized by both pretraining and knowledge graphs to better understand highly dimensional and naturally sparse clinical data, she says. It was first pretrained in an unsupervised fashion on three million patient records pulled from deidentified healthcare claims data covering nearly 300 diagnoses and more than 9,000 medications. 

This foundation model was then further enhanced with knowledge graphs, representing biomedical concepts and relationships, where data were labeled for treatment effect estimation, explains Liu. An entirely different dataset was used so as not to “contaminate” the training data.  

Pretraining and fine-tuning aren’t novel AI concepts; only their application to the treatment estimation problem is new, she says. In the ecology field, for example, the paired technique is employed to isolate animal noises from forest audio recordings. Large language models like ChatGPT also rely on pretraining and fine-tuning to acquire a broad understanding of language as well as specific domain knowledge. 

CURE directly addresses the three major challenges in treatment effect estimation—encoding structured longitudinal observational data into sequence input, lack of a well-curated, large-scale pretraining dataset, and lack of real-world downstream tasks for benchmarking, Liu says. The model also employs a comprehensive embedding method to incorporate structure and time information to deal with complexities such as records that encompass multiple visits comprising various types of medications or diagnoses and the irregularity of the observational patient data. 

Traditional machine learning approaches aren’t sufficient for the task because they require a large amount of labeled data and can’t adequately address the confounding bias or capture the complex interplay between treatments, patient characteristics, and outcomes, Liu adds. The KG-TREAT framework synergizes large-scale observational data with the biomedical knowledge graphs to overcome these limitations. 

During the COVID pandemic, Zhang and Liu came up with a drug repurposing framework where treatment effects were estimated by mimicking a randomized clinical trial for each ingredient in different drugs. Their aim was to speed up hypothesis generation and reduce translational problems by leveraging observational data on humans rather than preclinical data on humans.  

The application of RWD remains the common denominator in their modeling work. The U.S. Food and Drug Administration (FDA) now accepts evidence that comes from analyzing RWD to support its assessment of new indications for approved drugs, notes Zhang. 

RWD is Gamechanger

Replacing clinical trials is not the goal here, says Liu, but rather accelerating them by generating some conclusions about drug candidates and the specific indications for which they are best suited. Clinical research can therefore test the right compounds based on the behavior of similar ones in real-world use, as well as start customizing treatments to different subpopulations.  

In the latest paper, CURE was tasked with identifying the drugs most effective at reducing the risk of stroke, the leading cause of death from heart disease. Specifically, the model was used in a head-to-head comparison of rivaroxaban vs. aspirin, valsartan vs. ramipril, ticagrelor vs. aspirin, and apixaban vs. warfarin.  

Well over a decade ago, computational biologists at GlaxoSmithKline began pioneering AI-based drug repurposing methods, says Zhang, which initially focused on the structural features of compounds or proteins, genome-wide association studies, transcriptional responses, and gene expression. The newer addition of longitudinal RWD as a data source has been the gamechanger. 

The clinical development stage at which RWD is useful in estimating treatment effect depends on the goal of the clinical trial that is to be emulated, he continues. For phase 4 trials, for example, his lab is currently working on developing early signal detection algorithms based on the response of patients taking similar drugs in the real world that can inform study design.  

For drug repurposing tasks, which exploded in popularity during the pandemic, treatment effect estimation is beneficial primarily in the realm of drug discovery in terms of expanding use of an existing medicine to a new disease, says Zhang. That generally happens after phase 1 but before phase 3 trials begin. 

Large Language Modeling

The long-term vision with CURE is that it be part of an AI-powered decision support tool embedded in EHRs providing physicians with immediate access to the digital twin of individual patients to help guide treatment choices. RWD is needed to feed into the large language models used in building this type of generative AI application, points out Zhang. 

When it comes to estimating drug effect, generative AI could theoretically train on millions of patient records from across the country. But the fine-tuning part to uncover the right treatment case by case will take just a fraction of that, perhaps a couple thousand records, Zhang says. 

“We’re a friend of the real-world evidence but also the most recent AI concepts to make sense of the clinical problem of [processing] patient real-world data,” he says. As announced late last year, Roche is already working with Nvidia to accelerate drug discovery using generative AI where treatment effect estimation is involved. 

Possibly before the end of the year, Zhang adds, his Artificial Intelligence in Medicine Lab will also be actively working with large language models—both to mimic clinical trials and to remain on the cutting edge of the computer science domain. The lab is one of the top teams doing AI in medicine, he notes. 

Load more comments