Machine Learning Adds Speed To Electronic Health Records-Based Phenotyping

By Deborah Borfitz

September 21, 2021 | In clinical informatics research, creation of rule-based algorithms is a robust method for mining electronic health records (EHRs) to classify diagnoses. But the process requires considerable time and expertise to build, refine, and deploy, which explains why to date only 81 phenotypes have a publicly available algorithm warehoused in the Phenotype KnowledgeBase (PheKB), according to Benjamin S. Glicksberg, Ph.D., assistant professor of genetics and genomic sciences at Mount Sinai’s Icahn School of Medicine and Hasso Plattner Institute for Digital Health.

That number could start to grow now Mount Sinai scientists have shortened the timeline with an automated, machine learning-based algorithm that can be used as a starting point for phenotyping, he says. Eventually, it could enable large-scale analyses across hundreds of diseases simultaneously.

The open-source Phe2vec framework, described in an article recently published in Patterns (DOI: 10.1016/j.patter.2021.100337), is largely the brainchild of Riccardo Miotto (now director of machine learning at Tempus Labs) with whom Glicksberg developed the method. For nine of 10 diseases, Phe2vec performed at least as good as the traditional gold-standard method, assessed based on manual chart review by experts, in identifying afflicted patients.

To develop a disease-specific algorithm currently, researchers must first comb through reams of medical records looking for combinations of data (e.g., lab tests or prescriptions) that are uniquely associated with the condition and have the output manually double-checked, Glicksberg explains. Even brilliant clinicians, researchers, and computer scientists that brainstorm a set of intuitive rules for a given disease often must continually refine and adapt their criteria due to the messy nature of EHR data. The process typically gets repeated whenever researchers want to study a new disease.

Phe2vec learns, on its own, how to spot disease phenotypes using unsupervised embedding algorithms to find connections between data and disease, he says. It trained on tens of millions of data points on the nearly two million patients of Mount Sinai Health System.

The data represents patients and care processes of five hospitals in the Mount Sinai Health System across multiple boroughs in New York City, says Glicksberg. Relative to health systems elsewhere, the population it serves is very diverse.

Gold-Level Performance

Healthcare informatics research today often utilizes retrospective patient data in EHRs and relies on ICD billing codes as proxies for disease status, says Glicksberg. But EHRs are not reliably accurate as they were not designed for research—a diagnosis of irritable bowel syndrome might get missed among people assessed in a psychiatric setting, for example, or the billing code affiliated with diabetes could get used for people who were assessed for the disease but did not in fact have it.

Thoughtful researchers therefore began building disease-specific rules for retrospective data based on clinical characteristics, he says, such as requiring a lab test over the last six months with a result within a certain value range, some but not other billing codes, and a prescription of a set of medications. The electronic phenotyping algorithms would then get benchmarked against manual review and, if they were shown to produce valid results often with external validation, would get uploaded into the PheKB for public dissemination.

“The problem is [the algorithms] take a while to develop … so there is a limited number of diseases that have one,” continues Glicksberg. Researchers often opt to simply acknowledge the imperfections of the “silver standard” ICD billing codes for their work that can bias study findings.

That’s what prompted Glicksberg and Miotto to see if the data might “speak for itself,” he says. In addition to speeding up the overall process for building rule-based algorithms, the Phe2vec method they developed took no more than a day to train.

For their recent study, they put it to the test in identifying 10 diverse diseases with existing PheKB algorithms: abdominal aortic aneurysm, atrial fibrillation, attention deficit hyperactivity disorder, autism, Crohn’s disease, dementia, herpes zoster, multiple sclerosis, sickle cell disease, and type 2 diabetes mellitus.

The performance of Phe2vec and PheKB algorithms were compared head-to-head using manual chart review of progress notes. Overall, Phe2vec and PheKB achieved an average positive predictive value of 0.94 and 0.82, respectively.

It matters less that Phe2vec performed modestly better than existing algorithms than it was in most cases “as good as” rigorously defined rules in the PheKB across a sampling of disease and data types, he says. Autism was the exception and likely relates to the fact that diagnosis is based less on quantitative measurements than an interview process where physicians’ insights and impressions are being captured narratively.

Phe2vec currently relies on word embedding algorithms, namely Word2vec, developed to study word networks in text. But more advanced natural language processing techniques could be applied to future iterations of the methodology to take in raw text from physician notes “more comprehensively” and hopefully improve its performance on diseases that rely on insights from observational data, says Glicksberg.

Developers are also trying to create ways to embed quantitative information from modalities other than structured and unstructured EHR data—such as imaging and electrocardiograms—into disease representations, he adds.

Now And Later

The research team has submitted proposals to assess the similarity of the data warehoused by the Mount Sinai Health System to that of other systems, Glicksberg says. It also wants to characterize diseases more comprehensively by learning representations of various conditions from multiple heath systems at the same time.

This will be done in a privacy-preserving fashion through federated learning, so no identified data leaves the secure environment of participating health systems, he notes.

Outside researchers can start using Phe2vec for retrospective EHR-based analysis at their own institution now, based on the code that Miotto has released, says Glicksberg. But it will be a while before Phe2vec gets any pragmatic and prospective clinical use.

The goal in any case is to “learning representations of diseases,” including how data is entered in the records of health systems and how it can be leveraged for retrospective analyses, he says. Among the potential applications are to expedite clinical trial enrollment by prioritizing candidates who might fit the inclusion and exclusion criteria. If genetic data gets added to representations, Phe2vec might also be of utility in identifying people for research on rare diseases.

The “beauty” of such an approach is that it can crunch a lot of data across many modalities at once, says Glicksberg. In theory, Phe2vec could one day aid in decision-making by clinical practitioners regarding needed screenings and diagnostic tests.