EHRs + Machine Learning Decipher Drug Effects In Pregnant Persons

EHRs + Machine Learning Decipher Drug Effects In Pregnant Persons

By Deborah Borfitz

June 22, 2020 | Researchers at Vanderbilt University Medical Center are using a novel, data-driven “target trial” framework to investigate the efficacy and safety of medicines in pregnant populations, who are underrepresented in randomized controlled trials (RCTs). The approach leverages observational data in electronic health records (EHRs) to spot connections between real-world drug exposures during pregnancy and adverse outcomes in the women’s offspring—an exercise that can be expedited with commonly used machine learning models like logistic regression.

So says Vanderbilt undergraduate student Anup Challa, founding director of the investigative collaboration MADRE (Modeling Adverse Drug Reactions in Embryos). The group works in the emerging field of PregOMICS that applies systems biology and bioinformatics to study the efficacy and safety of drugs in treating a rising tide of obstetrical diseases. Partnering institutions are Northwestern University, the National Institutes of Health, and the Harvard T.H. Chan School of Public Health.

The concept of target trials was first mentioned in epidemiology literature a decade ago, says Challa, and is only starting to gain traction. “Target trials really hinge on retrospective analysis of existing data using machine learning methods or other kinds of inferential statistics.”

A study coming out of Vanderbilt a few years ago looked at the effects of pregnant patients’ genomics on outcomes in their neonates and found harmful single-nucleotide mutations on key maternal genes that mimicked patients taking inhibitory drugs, says Challa. Specifically, the research team conducted a target trial to learn that these mutations on the gene PCSK9, which controls cholesterol levels, led mothers to deliver babies with spina bifida.

That was a signal that mothers ought not to be taking PCSK9 inhibitors, Challa continues, which are “becoming of increasing interest to physicians for treating hypercholesterolemia.” It also meant common genetic variants could serve as a proxy for drug exposures in target trials when insufficient prescription data exist in pregnant people’s records.

A probability value generated by a machine learning algorithm would not be “sufficiently indicative” of a drug safety signal to warrant immediate interrogation in humans, says David Aronoff, M.D., director of the division of infectious diseases at the Vanderbilt University Medical Center. But, as he and his MADRE colleagues argued in a recent paper published in Nature Medicine (DOI: 10.1038/s41591-020-0925-1), target trials are a viable and potentially more definitive alternative to fetal safety than animal models or cellular response to a drug in a dish.

The ultimate goal with target trials is to simulate the level of safety and efficacy testing done in RCTs with non-pregnant populations as a matter of health equity for people who for ethical or logistical reasons can’t be enrolled, says Challa. But where they fit into the regulatory framework for drugs has yet to be defined, or even explored.

Next Step: Tissue Modeling

Aronoff thinks of target trials as “reverse engineering” the normal drug development process, which typically starts in a petri dish on the bench then advances to animal models and finally clinical trials outside of the pregnant population that (if all goes well) leads to an indication for use. “We’re trying to take existing, real-world data about the use of those drugs in pregnancy to identify [safety] signals… some sort of problem in the development of the fetus in utero that ends up showing itself either during pregnancy or post-partem in the offspring. If there is a mechanistic basis for that, then we can now go backwards to the bench and try to understand whether there is a causal relationship.”

Organ-on-a-chip technologies and other advances in tissue modeling can be particularly good at recapitulating drug exposure informative, “particularly in the context of what is happening in the pregnant uterus,” says Aronoff. His MADRE colleague Ethan Lippmann, Ph.D., in Vanderbilt’s department of chemical and biomolecular engineering, has been building three-dimensional models of brain development that could be used as a platform for testing the teratogenic effects of drugs (or metabolites of those drugs) on neural development and neural outcomes like seizure disorders or microcephaly.

Aronoff, who is also a professor of obstetrics and gynecology at Vanderbilt, is keenly interested in seeing three-dimensional organotypic models of the placenta exposed to various drugs, metabolites and toxins of interest—and serially to other organ models that might include the brain, heart and musculoskeletal system. The different models could be viewed as “cartridges” that get plugged in based on signals seen in the machine learning study.

“We’re trying to look at organ development and organ function in this better, more innovative context,” says Aronoff, which would add to what is learned from target trials.

The potential of target trials is both about discovering and investigating drug safety, says Aronoff. “Most drugs have never been clinically tried in a randomized, placebo-controlled way in pregnancy and, even if they have been, it’s uncertain that anyone was paying close attention to outcomes not only for the fetus but in early childhood and [beyond]. But when you have electronic health records that couple mothers and their exposures with their offspring even sometimes years later, you have the power to discover for the first time an association that no one knew about.”

That first level of discovery—e.g., a higher level of prevalence of schizophrenia or autism or asthma in childhood due to exposure to a drug in the womb—prompts questions about whether the association has a mechanistic basis that may be revealing of fundamental aspects of human development, he notes.

Indeed, it should be possible to use target trials as a first step in identifying whether or not diseases that occur later in life are linked to an earlier stimulus or cause, adds Challa. The story of an individual’s health is influenced by factors not immediately visible, including exposures in utero that can lead to lifelong disease.

EHRs could provide researchers with the ability to evaluate people’s health from the time they were in their mother’s uterus until late in life, so they can start to think from a “systems perspective,” says Challa. When tapped by target trials, they greatly enlarge the information available to guide therapeutic choices and inform drug safety.

QSAR Technique

Many databases and patient registries exist for reproductive toxicology and the reporting of significant adverse events. But the information isn’t available in a form that’s easily manipulated by machine learning models, says Challa, making it challenging to arrive at statistically rigorous results.

The problem extends to Food and Drug Administration and National Institutes of Health datasets used in a recent study appearing in Reproductive Toxicology. “What we found and continue to find is that the data out there is not at the level it should be” for informing prescribing behavior at the point of care for pregnant women and their developing fetuses, Challa says.

The study was attempting to identify chemical features of a drug that would be predictive of its teratogenic potential and could be fed into a machine learning model to formalize those associations, he explains. Specifically, researchers looked at whether or not adverse outcomes have an “inherent structural rationale” and, if so, if a meta-structural analysis might be performed to identify known pharmacological variables (e.g., absorption, distribution, metabolism, and excretion profile) that may be the culprits. They also accessed real-world laboratory data to look for chemical structures associated with markers of disease in human tissue samples.

Recognition of the conflicting nature of adverse events data within patient registries was a key takeaway of the study, says Challa, and gave researchers “even more impetus” to focus on EHRs as a data source. But it also gave the team some structural information predictive of an adverse outcome that they can now use to cross-validate results produced by their target trial framework.

The paper highlighted a novel application of machine learning, the quantitative structure activity relationship (QSAR), to learn about the structures of drugs and their pharmacological behaviors that are associated with teratogenicity. QSAR should also be able to make similar predictions for any new compound, says Aronoff. “It’s a separate way [than EHR mining] of interrogating drug safety to look for associations.”

The two techniques are related in that any unwanted drug effects in a fetus or offspring that are uncovered in medication-wide association studies could be plugged into QSAR, Aronoff says. Perhaps something already known about the drug’s structure could point to a causal relationship, a hypothesis which could then get tested more directly in tissue models.

Aronoff’s hypothetical example is an antidepressant drug that gets newly associated with an adverse pregnancy outcome. “Can we keep its antidepressant activity but enhance its safety by targeting the structure that is actually the bad actor?” If so, he says, medicinal chemistry stands to gain some ground.

Linked Patient Records

Another limitation of mining the databases where adverse events are being reported is that “some subtle, infrequent and unexpected relationships” invariably get missed, says Aronoff. Women may be on medications chronically when they give birth to a child with a teratogenic problem or later health problem and “there may be no awareness that those things are related.” It’s unreasonable to expect anyone to make the mental connection when years can separate the drug exposure and unwanted outcome.

Target trials use the power of machine learning to interrogate hundreds of thousands, if not millions, of linked patient records to find the “needles in the haystack,” Aronoff adds. “In some respects, that can be much more sensitive than relying on individual people to report some association where there may need to be an incredibly strong signal or very horrible outcomes that are chronologically associated with the exposure.”

Available adverse exposure reporting information is also mostly freeform text, making it difficult to extract for use in target trial models, says Challa. EHRs, in contrast, are much more structured and minable documents.

Vanderbilt has taken a leadership position in creating meaningful databases out of EHR information, Challa says, including the use of natural language processing to put text fields in a machine-readable format. Its BioVU DNA repository, for instance, consists of high-quality, up-to-date genomics information linked to de-identified medical records and is routinely updated and maintained by a team of on-campus IT experts. Another repository is Vanderbilt’s longstanding Research Derivative, a database of identified health records and related data drawn from Vanderbilt University Medical Center’s clinical systems and restructured for research.

Large databases of linked health records, available mainly at institutions with similar patient volume and health IT infrastructure as Vanderbilt (whose clinical databanks contain EHR information for more than 2 million patients), are what make target trials feasible, says Challa. “It is often unethical to create linkages across clinical datasets that don’t already have it.”

Ethical Approach

The proposed target trials framework will robustly input several medication exposures of interest from pregnant patients and try to associate them with a battery of developmental outcomes from the EHRs of their children, says Challa. In contrast, clinical trials typically test the potency and safety of one drug for a single disease or cluster of similar diseases.

By providing a basis for causal inference, target trials are "the only ethical way to gather human drug exposure data for pregnant people on a significant scale and across all classes of drugs," he and his colleagues argue in the Nature Medicine paper.

Within a few years, MADRE researchers hope to be inputting drug lists into a reproducible set of machine learning algorithms and statistical methods and outputting associations to several serious neurodevelopmental diseases, Challa says. Future plans also include taking positive drug-disease associations in pediatric patients and extrapolating the impact of early exposures to their later life course.

“As I like to say to my friends who are physicians and have specialty areas,” Aronoff says, “many people suffer from the diseases they care about, but every human being has experienced childbirth. We have to get that right.”

While pregnant people should be enrolled in RCTs of drugs and vaccines, Aronoff adds, “the reality is that [pregnancy] is always going to be a barrier. Target trials are a way forward.”