Finding More Patients for Clinical Trials by Unlocking Intelligence from Patient Records

Contributed Commentary by Vibhor Gupta, Pangaea Data 

September 30, 2022 | Clinicians and scientists need actionable information from patient records, including diagnostic and next-generation sequencing test results, in a privacy-preserving manner to make critical decisions in matching patients to clinical trials. Most software tools consider such structured data and some specific terms from doctors’ notes as important factors in selecting patients. However, such discrete values alone cannot capture the complexity of a patient’s health journey or trajectory. 

Properly stratifying patients for clinical trials requires utilizing all available intelligence from their patient records, including medical histories, referral notes, radiology, and clinical reports. This added detail helps map patients’ journeys and disease trajectories more accurately. 

Recent technical advances allow the mining of healthcare data for specific information but have proven inefficient and not scalable since they require access to large quantities of data. Gaining access is challenging, given privacy regulations, and it does not help discover new knowledge in the context of the patient’s journey. Hence, clinicians and scientists continue to manually review patient records to infer such intelligence and insights to determine suitability for a clinical trial. 

The sheer volume of patient records combined with its increasing size—40% annually by some estimates—impedes the already complex practice of assigning International Classification of Diseases (ICD) codes to patient records for billing purposes, resulting in significant miscoding. Consequently, algorithms designed to find patients for clinical trials using keywords or ICD codes miss many potential patients. 

Privacy-Preserving Patient Characterization Finds More Patients for Clinical Trials 

Despite major technological advances in clinical trials, finding patients and appropriately matched trials is still challenging. The problem is exacerbated when the trial involves thousands of protocols and candidate patients. Manually sifting through thousands of patient records is neither sustainable nor scalable. 

Novel artificial intelligence (AI) methods have recently proven clinically valid and valuable for capturing clinical signatures characteristic of target patient populations investigated by a clinical trial. These clinical signatures generate from real-world patient records in the environments where the data resides without the need to touch or move it. This methodology assures compliance with privacy regulations and ensures replicability across patient populations. 

Additionally, characterizing patient populations leads to new insights about diseases and patients’ health journeys, helping define new endpoints to optimize clinical trial protocols and qualify sites and data partners. Clinicians are vital to the scalability and evolution of such AI methods, helping explain the underlying scientific approach and methodology to build confidence in the clinical and scientific community. 

An example of how AI-generated clinical signatures can be applied across hard-to-diagnose conditions comes from the UK in the context of cancer-associated cachexia. Cachexia is a muscle-wasting condition developing in up to 80% of cancer patients, but the disease remains undiagnosed in 90% of patients who exhibit similar symptoms. As a result, patients with the disorder have impaired responses to treatment and comorbidities such as hypertension, anemia, and congestive heart failure, resulting in poorer patient outcomes. 

Some technical methods can quickly mine medical records for relevant ICD-9 codes to identify these cachexia-associated clinical features; however, individually, these features are comorbidities of multiple different medical conditions. As such, these methods identify large numbers of patients who are found to not be cachectic after a pretrial screening. 

The novel AI methods referenced above help understand relationships between the features characteristic of cancer-associated cachexia, leading to the discovery of 6 times more clinically validated, undiagnosed, misdiagnosed, and at-risk cachectic cancer patients. This discovery led to new medical knowledge for characterizing cancer-associated cachexia, a 50% reduction in the cost of treatment, leading to potentially £1 billion savings for the UK’s NHS and a six times improvement in screening success rates combined with improvement in patient outcomes. 

These AI methods' productized application results were obtained in only four to six weeks. This rapid turnaround helped accelerate clinical decision-making in healthcare and clinical trial processes by several orders of magnitude. In addition, these methods are now being scaled across multiple sites in a federated privacy-preserving manner, enabling the evolution of knowledge about cancer cachexia and providing clinicians and scientists with the latest intelligence in near real-time. 

Given that the intelligence and outputs from novel AI methods are beneficial for clinicians in both healthcare and the pharmaceutical industry, they enable bridging this ecosystem based on knowledge discovery and sharing in a privacy-preserving manner as opposed to a transactional relationship based on access to data and technologies. The result is a more sustainable collaboration between these industries, opening new opportunities and a higher chance of realizing the full promise of precision medicine and preventative health. 

Vibhor Gupta has a breadth of experience in life sciences through his work in industry and academia over the past two decades. Before founding Pangaea, Gupta spearheaded the European business for Quantum Secure, an enterprise software solutions provider, which was acquired in 2015. Following this, he served as Senior Vice President of Commercial Strategy and Sales at Seven Bridges Genomics. He has also worked as a management and technology consultant for Deloitte. His academic career focused on conducting molecular biology studies and building AI algorithms with epigenetic, genomic, transcriptomic, and clinical trial data in the context of oncology and infectious diseases. Gupta has an extensive global network in the life sciences industry and is regularly invited to speak at international conferences, government-funded programs, and investment summits. He can be reached at