Data is Both the Fuel and Downfall of AI in Drug Development

Data is Both the Fuel and Downfall of AI in Drug Development

By Deborah Borfitz

May 21, 2026 | High-impact opportunities for leveraging artificial intelligence (AI) across the pharmaceutical research and development (R&D) cycle are too numerous to count but won’t “automagically” solve any of the longstanding problems in ushering molecules to market. AI requires human attention to match tools to tasks for ensuring the accuracy of model predictions, and the pitfalls are dangerously easy to miss, according to Faisal Khan, Ph.D., corporate vice president of artificial intelligence and analytics at Novo Nordisk, speaking at this week’s Scope X conference in Boston.

In every functional area of the drug-making business—from operations and finance to human resources and the supply chain—AI now plays a role. “We have the burden of helping patients and, when we do bad AI, we hurt patients’ lives,” says Khan in stressing that AI be used in the “most robust manner possible.”

Khan likens innovation in the field to a fast-moving train with infrequent stops to ponder whether the tools being deployed are appropriate for the problems at hand. The stakes are higher than in other industries, he says, because “what we’re doing at the end of the day is changing how somebody’s body works—hopefully, to help them live a better, healthier life. There is no greater responsibility than that.”

The challenge starts with the data, which is both awesome and a mess. “If we don’t understand that then we’re not ready to deal with the consequences,” says Khan. “We’ve come to terms with the volume of data ... but the variety of data still confounds us.”

Data from imaging, wearable devices, omics analysis, and unstructured physicians’ notes comes from different systems and platforms at different times, at lightning speed, he points out. The central quest is detecting a “real signal” amidst all the background noise.

AI’s Evolution

AI is all about data and building models to, for example, determine how many patients will develop type 2 diabetes based on how much exercise they do, says Khan. Models are “a little bit erroneous,” suffering as they do from generalization errors. Foundational statistics have long been used to measure and minimize the gap between how well a model performs on training data versus new, unseen data.

The inspiration for AI was “the greatest learning machine we know, which is the human brain,” Khan says. It aims to simulate real biological neurons, the cellular entities that make learning possible.

In a neural network, where the human brain is being modeled, the inputs and outputs are strictly mathematical, he continues. The colossal number of cells in the brain has been a key obstacle, up until about five years ago when deep neural networks emerged to enable the training of AI to solve specific real-world tasks.

This led to the modern notion of large language models (LLMs), “really big neural networks” that initially predicted text and later abstract concepts involving reasoning, real-world actions, and scientific outcomes. LLMs today process multiple types of images and data, Khan notes.

LLMs have been connected to various data sources to help with basic decision-making and supercharge workflows. The field then evolved to employ AI agents, taking the sophistication of question prompting “a click higher” ... [to] derive some new insights that might not have been obvious before,” says Khan, fueling an explosion in the use of agentic AI over the last two years.

These days, when people talk about AI, they are most often talking about agentic AI, Khan says. However, “there are still a lot of application areas where you might be using traditional machine learning approaches ... [or] generative AI and large language models without an agentic layer on top. It’s important to recognize what’s the right tool [for the problem], not if you have a hammer everything looks like a nail.”

Many Applications

A host of opportunities exist in pharmaceutical R&D to use AI and generative AI “across the entire value spectrum,” says Khan, “from preclinical research through clinical and post-launch activities.” AI has been of some utility in the R&D lifecycle for about 20 years now.

At Novo Nordisk, like any large pharmaceutical company, “hundreds if not thousands of AI applications” are in play, he reports. “For us, what the conversation comes down more to these days is the scientific versus operational ... and the regulated and non-regulated sense [of AI].”

The first stop for AI on the value chain is to design new molecules that hit previously unknown or hard-to-treat biological targets. A tremendous amount of research, innovations, startups and companies are relying on generative and agentic AI to not only identify new targets but design the next generation of small molecules, peptides, and biologics, he says. In the molecular design chemistry space, in particular, artificial neural networks are creating chemical compounds “that human minds just can’t.”

AI applications specific to imaging are using essentially the same technology as self-driving cars that can identify roads, parked cars, and pedestrians, says Khan. They’re designed to analyze not only immunohistochemistry images but also other modalities such as MRIs, CT scans, and X-rays.

“What’s interesting is the algorithm is almost consistently wrong; a human is also wrong but not consistently wrong,” he adds. “An AI algorithm will always give you the same results, right or wrong. But then you can start to characterize ... the instances where it’s wrong” and the situations where the AI approach can be trusted. “I’m less concerned about being correct all the time. I am very concerned about being sure when I’m correct” to confidently stand behind the model’s predictions.

Many AI applications fall in the category of digital health and wearables to track everything from how people run, breathe, and sleep to how cancer is metastasizing or a drug is working, says Khan. These can help with both diagnosing a disease like Parkinson’s as well as the drug dose to treat it.

AI is likewise being used in clinical trial operations to, for instance, generate an adverse event notice if an enrolled patient visits the ER for a study-related reason versus a twisted ankle, he continues. Novo Nordisk is looking to leverage digital twins and causal inference to simulate phase 3 trials to predict how they will work out—e.g., type of adverse events, probability of success, the responders and non-responders—before spending perhaps $300 million on an actual study. For hard-to-enroll populations, such as people with rare diseases or kids with cancer, the company may also simulate virtual patients to enable the launch of a new drug to the market.

After a trial concludes, it can sometimes be insightful to analyze how patients in the control arm would have responded had they been in the intervention arm, or vice versa, says Khan. These “virtual response” predictions can help drug developers understand safety signals, efficacy, tolerances and other critical pieces of information that regulatory bodies are interested in. Virtual twins can also help companies fill in missing data with synthetic simulations, which is currently a major use case for AI in clinical trials.

Timesaving on Tasks

AI also has a role in assessing manufacturing quality, points out Khan. “Almost every large-scale manufacturer is using computer vision systems to look at quality” as well as predict emerging defects and supply chain disruptions in and outside the clinical trials space.

Many AI-powered operational data science solutions are now available for investigator ranking, trial optimization and forecasting, site selection, and assessment of patient burden, all of which provide actionable information for making studies run better, faster, and more efficiently, Khan says. The goal is typically to conduct a study quickly at the cheapest cost and highest quality, he adds, but companies are lucky to routinely achieve one of those objectives.

AI could help the pharma industry up the odds of more often hitting all three of its clinical trial aims. Novo Nordisk has been experimenting with the Claude Code tool to generate R- and SAS-based code for converting study data tabulation model data into analysis data model datasets for its regulatory submissions. That’s something biostatisticians are currently spending a lot of time and energy programming, he notes.

Clinical study reports and clinical trial protocols are likewise tasks that take a long time to manually author, and at least some of the work could be offloaded to generative AI platforms built on large language models, says Khan. It would save medical writers time and energy they could devote to more significant and impactful undertakings.

AI approaches are useless without data, so companies are now harmonizing their own clinical trial data sources for secondary uses. At Novo Nordisk, the platform for doing this is known as FounData. The platform enables the company to not only analyze what happened in previous studies but also layer an agentic framework on top to start asking questions in a highly intuitive manner, much like ChatGPT, says Khan.

These include questions like how many patients from what type of population enrolled in studies over the last two years and how many dropped out, which endpoint was hard to meet, and the safety signal detected in the last four studies, he offered as examples. “The agentic layer on top of that foundation of data helps us be smarter in trial design. Before we get a whole bunch of people in a room spending six months designing a trial, we can get the first draft ... because we’ve seen what has historically happened. Past performance doesn’t always predict future results, but it helps us get a lot closer to it.”

Among large pharma companies, their own trials are the biggest source of data. But if they don’t invest in developing that data foundation, it’ll be hard to tap into the insights of generative AI, says Khan.

Data Issues

One of the biggest pitfalls in working with AI tools in pharmaceutical R&D is “thinking all data is true,” says Khan. “Data might sometimes lie, or we might not realize how to ... understand what the data is telling us. We might not realize that we’re combining signals from a wearable device that was calibrated according to different firmware than the one that just got upgraded, [or] we might not realize that we’re combining a biomarker coming off two different sequencing machines or maybe two different labs that use different reagents.”

It pays to be cautious with data, he said, citing an extremely dangerous water molecule known as dihydrogen monoxide (DHMO) that is used in a lot of cancer research. “It’s a product of industrial waste, everyone exposed to it has died, and I’m surprised it is not ... more famous.” The problem stems from the fact that dihydrogen monoxide is commonly known as water (H2O), which no one worries about. His point is that facts, with and without context, will lead to different conclusions.

Drawing from his own career, Khan recalls how a perfectly good AI model produced outcomes related to the day samples were processed in a lab rather than the detection of cancer. In this case, the issue was that the samples coming in earlier from a large academic center were used to build the model while samples coming in later from community sites were used to validate it. The main academic site had much better labs and more recent, high-risk patients than the smaller community labs.

“That model did not hold up very well because we didn’t think through those implications,” says Khan. It was also a business problem, he adds, since “no one was going to pay me to wait around for eight months for [all] the data to come in.” Researchers were also capturing digital pathology images, never considering the impact of camera upgrades that changed the underlying spectral mechanics of the histology images being read by the model.

On another study dealing with data at scale, little red blobs were appearing on digital slides that an AI model classified as red blood cells that were highly predictive of cancer. “We came up with all sorts of rationalizations for this, maybe just more blood flowing,” says Khan. It took a human pathologist to determine that the presumed red blood cells were in fact a red dye used to stain the tissue samples ... totally random junk.”

It is important to remember “data is the fuel of AI, but data can also be our downfall,” says Khan. “We have to think about the data in the right way, ask the right questions, understand the baggage that the data comes with and the dependencies that it has.”

Context Matters

There are also algorithm-related issues to contend with, says Khan, recalling the excitement several years ago when a deep learning model was developed that could distinguish images of wolves and dogs with roughly 98% accuracy. Research into what was driving the prediction came up with the explanation, and it had nothing to do with the animals in the picture but the surrounding background. “Outside scenes of grass or snow were being classified as wolves and indoor scenes of carpet or hardwood floor were being classified as dogs.”

Applied to the domain of breast cancer research, this might logically raise questions such as whether a model developed for women over the age of 50 could be used for women under the age of 50, or if the digital histology would be the same for men with prostate cancer, he says.

Several years ago, a lawyer blindly trusted ChatGPT to do his legal filings and ran into some serious hallucination problems, Khan continues. Similarly, operational efficiencies being sought through AI-generated code might instead trip people up. “One of the things we’re finding out is that people using data to train new datasets are using AI to do that, so there’s this bad feedback loop going on here that we might not even be aware of,” he adds.

“At the end of the day, context is everything,” Khan says. “We might ask an agentic AI system why patients are struggling with our pen ... but a pen means something very different to a diabetes company like Novo Nordisk than the conclusions that the original AI agent suggested.” By asking more questions, and giving the AI program more information, “the results that we get are more relevant, more reliable, and more insightful.”

While developers have significantly reduced hallucination rates of AI systems, he says, “we’ve dug ourselves into a deeper hole now because ... [hallucinations] are becoming a bit more difficult to detect unless you’re a domain expert.” Ask an agentic system to explain how insulin is secreted from pancreatic beta cells, and it will generate a result that makes sense to most people but not to those working in the field who know that it is scientifically wrong.

“If we don’t have that context, if we don’t have that oversight, if we don’t think through how we’re using AI and what data it’s built on, we might have some flashy results that look good, that initially seem to pass the smell test, but then they fall apart later,” Khan says.

In terms of the most important challenge that Novo Nordisk is facing in scaling AI across the organization, it’s that an AI model is “not perfect for all time,” Khan says in conclusion. “Very few of us are going back to looking at a model once it has launched to check for drift.”

He ends with an incident from a prior career, before the start of the pandemic, when his team launched a clinical trial prediction tool. When COVID hit, the model was no longer relevant because how patients were enrolled in the summer of 2021 was very different from how they were enrolled only two years earlier. “Models might be good for the data they had, but they might not be relevant anymore.”