Machine Learning Can Predict If COVID-19 Trials Will Succeed

By Deborah Borfitz 

July 29, 2021 | A pair of computer scientists at Florida Atlantic University have come up with a machine learning approach to predict the likelihood of a clinical trial being terminated down the road and attribute factors contributing to study termination or success. When applied to the flurry of COVID-19 trials launched since early last year it performs particularly well, which would likely be the case for studies across other major disease groups and therapeutic areas, according to Xingquan “Hill” Zhu, Ph.D., professor in the department of computer and electrical engineering and computer science. 

Zhu teamed up with Magdalyn "Maggie" Elkin, a second-year Ph.D. student in computer science, to produce both the general framework for predicting trial termination (DOI: 10.1038/s41598-021-82840-x) and, as most recently published in PLOS ONE (DOI: 10.1371/journal.pone.0253789), its specific application against over 5,193 COVID-19 studies registered on ClinicalTrials.gov. 

The model demonstrated impressively good prediction results—0.87 area under the curve (AUC) and 81% balanced accuracy (a measure of both sensitivity and specificity)—on 909 completed COVID-19 trials and another 191 that were ceased, including studies that were terminated, withdrawn or suspended. The original model, using 311,260 trials across multiple therapeutic areas, produced “satisfactory” prediction results with just over 0.73 AUC and 67% balanced accuracy. 

Clinical trial companies might think about deploying the model at the study design phase to gauge the likelihood of trial success under various potential scenarios—if more study sites are added, for example, or fewer patients get a placebo. One company is already in discussion with Zhu about the methodology and available data sources, he says, suggesting industry interest in adopting machine learning to ratchet down the ever-increasing cost of running trials.

Much like a patient needing referral from a general practitioner to a specialist to be accurately diagnosed, prognosticating on trial success or failure is best done by homing in on trials targeting a certain disease, says Zhu. “The same methodology applies, but not all of the features.” 

The method considers clinical trial administration, eligibility, study information, criteria, drug types, and study keywords, as well as embedding features commonly used in state-of-the-art machine learning for predicting trial termination.

“For the COVID-19 analysis, we injected some specific features to characterize the drug because a significant number of trials involve repurposing” where existing medications are being investigated for new therapeutic purposes, he explains. Drug features and study keywords were the most informative features overall, but insufficient on their own to reliably predict success or failure. 

Useful Information 

COVID-19 studies differ in several significant ways from other types of clinical trials, Zhu shares. For starters, the proportion of interventional trials that terminate is much higher. They account for nearly half of all COVID-19 studies but about 93% of the terminations versus 82% and 12%, respectively, among the larger universe of trials.

Among COVID-19 trials that were terminated, 37.5% used a placebo group—nearly five times higher than the percentage in the completion trials, says Zhu. On the other hand, among all registered trials, only 15% of them uses a placebo, indicating that placebo group is a significant factor for trial cessation, possibly because of insufficient patient enrollment. 

Both across trials and among COVID-19 trials only, industry-sponsored trials were more likely to be terminated than those sponsored by federal grants, says Zhu. This likely reflects the motivation of companies to terminate studies that don’t meet their market-oriented objectives.

Perhaps unsurprisingly, hydroxychloroquine was the most frequent drug intervention (almost 10%) but represented nearly one-third of the terminated trials. Looking at drug class rather than individual drugs allows multiple concurrent trials researching similar drug interventions to be quantified without omitting less popular drugs under investigation, says Zhu. It also serves to highlight factors separating trials within a class, providing a good deal of useful information to drug developers. 

Keywords help provide context for the analysis. The term “cytokine” listed high only for cessation trials, presumably because these studies are researching the cytokine storm associated with severe COVID-19 infections and may be higher risk for unsuccessful outcomes to begin with. Conversely, words like “depression” and “anxiety”—indicative of lower-risk observation trials—listed high only for completed trials.

Building Trust 

Learning from experience in this way requires “a substantial amount of data,” Zhu notes, as would also be the case for studies testing interventions in high-target areas such as heart disease. If only a few records are available to tap, machine learning is going to “overfit” (generalize) what it has learned from a small training set and will likely miss many of the true commonalities driving trial outcomes.

Zhu says his department frequently collaborates with industry, although clinical trials is a relatively new area of focus. The work specific to COVID-19 is being supported by funding from the National Science Foundation and enabled by Elkin’s undergraduate and graduate training in physiology and biotechnology, familiarity with biomedical terms, and clinical development know-how. That body of work extends to a study attempting to make predictions off the growing volume of patent-level symptomatology and diagnostic data.

One priority for Zhu is to continue feeding the COVID-19 predictive model more data as more of those trials reach completion to ensure it is reliable. As of July 27, more than 6,240 COVID-19 clinical trials had been registered through ClinicalTrials.gov and close to 1,600—up from the 772 in January 2021—had reported completion or cessation status. 

Moving forward, many other study features might be explored for modeling purposes based on the wealth of publicly available data in the trial registry, says Zhu. “I think there are a lot of opportunities for people who have an interest in research or want to put [this sort of knowledge] into commercial use.” 

Anyone with an advanced computer science degree could produce similar analyses, he continues. The four machine learning models utilized—random forest, neural network, logistic regression, and XGBoost (ideal for large datasets)—are “commonly used in both academia and industry and actual performance is also very good.”

Using the machine learning methods together “avoids our team being biased by some best-performing or worst-performing models,” he says. When tested independently, none of the four models performed better than in concert with the others. 

The more fundamental question is how comfortable people will feel using the ensemble method if, for example, it predicts an 80% or 90% chance of trial failure, says Zhu. Trust in the output is key, which is why he and Elkin have endeavored to make their predictive model as transparent and interpretable as possible.