‘Simple’ NLP Tool Streamlines Metastatic Cancer Research

By Clinical Research News Staff 

October 21, 2025 | Researchers at the Medical University of South Carolina (MUSC) are using a “super-simple” natural language processing (NLP) tool to streamline clinical research for metastatic brain cancer, a field where rapid advances in therapy are outpacing traditional methods of data analysis. 

Stereotactic radiosurgery has become a common treatment for patients with brain metastases, but selecting the most effective approach depends heavily on knowing the origin of the cancer. Yet existing electronic health record (EHR) systems, which rely on International Classification of Diseases (ICD) codes primarily for billing and reimbursement, often fail to provide precise information about cancer subtypes or genetic markers. This gap complicates research efforts aimed at understanding outcomes, guiding treatments, and predicting complications such as radiation necrosis, which affects 5% to 10% of patients. 

To address this gap, Mario Fugal, Ph.D., a medical physicist in Charleston, South Carolina, led a study with colleagues from the Medical University of South Carolina (MUSC) using NLP to extract the primary cancer type directly from clinical notes in EHRs. Published in JCO Clinical Cancer Informatics (DOI: 10.1200/CCI-24-00268), the study found the model correctly identified the cancer of origin in 90% of cases and achieved nearly perfect accuracy for common cancers like lung, breast, and skin. Remarkably, it could even classify subtypes of lung cancer, information that ICD codes cannot capture. By automating the identification of cancer phenotypes, researchers can now efficiently analyze large patient cohorts without spending hours manually combing through charts. 

The MUSC team sees additional research applications, including predicting which patients might experience radiation necrosis after treatment. While challenging—because the complication is rare and can arise months or even years after therapy—success in this area could allow for risk modeling and improved patient stratification in future studies. 

Jihad Obeid, M.D., director of MUSC’s Cancer Integrated Data-Enabled Resource (CIDER), notes that the low-resource NLP model performs exceptionally well within the narrow space of radiation oncology notes. CIDER provides an integrated infrastructure for researchers, connecting EHRs, genomic data, and tumor registries to facilitate exploratory studies, trial recruitment, and outcomes analysis. 

“As with any AI project, a bias evaluation of the NPL model will ideally happen at some point,” he adds. 

The study also underscores the importance of validating NLP tools across institutions. While MUSC’s dataset included 1,461 patients and over 82,000 clinical notes, Fugal emphasizes that different note structures or physician documentation styles may affect performance.  

To read the full article by Deborah Borfitz, please visit Diagnostic World News

Load more comments
comment-avatar