LLMs Transform Clinical Trial Data Analysis with Human-Level Accuracy

LLMs Transform Clinical Trial Data Analysis with Human-Level Accuracy

By Clinical Research News Staff

September 4, 2025 | A study presented at the American Association for Cancer Research (AACR) conference reveals that large language models (LLMs) can match human experts in identifying critical cancer progression events from electronic health records—a development with significant implications for clinical trials and personalized medicine.

Dr. Aaron Cohen, head of oncology research at Flatiron Health, demonstrated that a Claude-based LLM produced nearly identical real-world progression-free survival estimates compared to trained human abstractors across 14 different cancer types. This breakthrough addresses one of clinical research's most challenging data extraction problems.

"Endpoints like progression are the main data points that help us figure out how patients are doing and decide whether a drug is approved or not, so it is critical to get them right," Cohen explained. The ability to accurately identify these endpoints is crucial for transitioning from controlled clinical trial environments to real-world clinical practice.

Overcoming Traditional Limitations

For over a decade, Flatiron has relied on manual abstraction by expert human reviewers—a time-intensive process that limited scalability in clinical research. The company's human abstractors, trained in oncology backgrounds, follow rigorous protocols to identify cancer progression events and dates from complex medical records originally designed for billing purposes.

Previous machine learning approaches using natural language processing and deep learning models struggled with contextual challenges, particularly in accurately identifying progression dates among the numerous date references in patient charts.

The research introduces Flatiron's VALID (Validation of Accuracy for LLM/ML-Extracted Information and Data) framework, which benchmarks LLM performance against expert human abstractors using independent reference datasets. This approach ensures fair comparison and builds trust in automated systems for clinical decision-making.

The framework requires clinician confirmation of progression events, focusing on clinical notes written by physicians actively caring for patients rather than relying solely on radiologist reports or pathology results from providers unfamiliar with individual cases.

Future Applications in Clinical Trials

The high-quality progression data extracted through LLMs will serve as building blocks for predictive modeling to improve treatment decision-making and optimize clinical trial enrollment. This technology promises to help clinicians identify the right patients for trials at the optimal time and flag when critical data are missing from patient records.

The approach also opens new possibilities for understanding physician decision-making processes—insights that could transform how clinical trials are designed and conducted.

For the full story including detailed methodology, bias evaluation findings, and technical implementation details, read Deborah Borfitz’s article at Diagnostics World.