AI-Powered Cell Mapping Could Transform Precision Oncology and Accelerate Clinical Trials
By Deborah Borfitz
July 22, 2025 | Scientists at Virginia Commonwealth University (VCU) are tackling the massive computational problem of determining the distinct types of cells inhabiting biopsied tissue. Their endgame is to deploy a suite of interoperable artificial intelligence (AI) tools that can accurately and reliably guide prescribing decisions for patients with cancer as well as predict their response to investigational treatments prior to enrollment, according to Kevin Matthew Byrd, D.D.S., Ph.D., associate research member at the VCU Massey Comprehensive Cancer Center and assistant professor of oral and craniofacial molecular biology at the VCU School of Dentistry.
One of their tools, known as TACIT (short for threshold-based assignment of cell types from multiplexed imaging data), already exists. It is an unsupervised, platform-agnostic AI algorithm which, as recently demonstrated, can use a limited number of cell type-specific features to distinguish 51 labeled cell types based on their gene expression profiles (Nature Communications, DOI: 10.1038/s41467-025-58874-4).
Although the field of spatial biology has been rapidly maturing only over the last few years, pathologists have been diagnosing disease based on biological structures in their spatial context since around 1875, says Byrd. Typically, the process involves tissue specimens stained to reveal the spatial relationships of different molecules, cells, and structures when viewed under a microscope.
With TACIT, Byrd and his colleague Jinze Liu, Ph.D., a research member at the VCU Massey Comprehensive Cancer Center and a professor in the department of biostatistics at the VCU School of Public Health, have come up with a way to analyze multiplexed slides at scale to uncover patterns across diseases to predict patient outcomes. They’re leveraging high-plex imaging data to expose dozens, if not hundreds or thousands, of markers simultaneously in a cell.
“That cell now lives in a network of other cells... [providing a] snapshot of the disease or the tissue in time,” he explains. TACIT makes sense of all that data in a matter of minutes.
That translates into savings not only in terms of the time spent using expensive technologies, but the human resources it takes to run the assays, says Byrd. A single test can take 24 or 48 hours—even up to a week—to yield results.
Thanks to advances in compute infrastructure and algorithmic efficiency, TACIT is now being scaled to analyze hundreds of samples in parallel, an appealing feature for pharma companies aiming to accelerate clinical trial recruitment, he adds.
Equally and perhaps more importantly, TACIT could find utility in predicting patient response to immuno-oncology drugs with its combined use of slide and transfer proteomics for, respectively, direct analysis of proteins and targeted protein studies, says Byrd. Both technologies have ballooned in popularity over the past five years, primarily for RNA transcriptomics sequencing.
While messenger RNA is a “decent surrogate” of what might ultimately happen to a molecule, the protein is the “for-sure” component, which might be linked to the cytoskeleton giving structure to cells or be operating in the immune system directing other cells “to do certain things or be a certain way,” he continues. TACIT can flexibly link the two technologies using the products of two different companies; it doesn’t matter if the protein or RNA started out on different slides.
This dual-analysis approach has often been used “to mirror whether a patient is a good candidate” for drugs like Keytruda to learn that “many, many times the RNA does not match the protein,” says Byrd. This is because proteins are made at different but often-known rates within a cell, while short-lived RNA can be highly unstable.
Mapping the Clusters
In the latest study, TACIT was benchmarked on three public spatial omics datasets comprising nearly five million cells across the 51 cell types to demonstrate its broad applicability as an AI algorithm agnostic to assay, species, organ, and disease. It outperformed three existing unsupervised cell phenotyping methods in accuracy and scalability while also integrating cell types and states to reveal new cellular associations.
TACIT is a self-learning algorithm whose performance autonomously improves over time to “call the positivity of each individual cell type without involving additional external information,” explains Liu. One of the most popular approaches currently is graph-based clustering with computational tools such as Seurat and Scanpy that present a map of cells based on the similarity of their expression.
The problem with the method is that it requires a lot of human time and effort to examine the map in search of clusters and then discover their enriched cell types, she adds. It is not only a highly subjective exercise; the map itself, which is based on high-dimensionality data, does not always provide a clear separation of the embodied clusters.
The “curse of dimensionality” is a major problem when working with spatial proteomics datasets, says Liu, referring to the difficulty of finding meaningful patterns and relationships when analyzing and modeling data in high-dimensional spaces. As the number of features (dimensions) increases, the markers of interest become increasingly sparse.
To address this problem, TACIT utilizes computational methods to reduce the dimensionality of gene expression data by focusing on a relevant subset of features (subspaces) to better distinguish cell types and achieve a “strong, robust signal” separating them into clusters, she says. “The goal is to address the scenario when we use all the features and everything looks very similar,” like trying to identify individuals with a particular health risk profile within a crowded stadium, where the differences are subtle and buried in noise.”
Collaborative Projects
Byrd and Liu have several research projects underway, including a collaboration with Blake Warner, D.D.S., Ph.D., a researcher with the National Institutes of Health who specializes in Sjögren's disease, a chronic autoimmune condition affecting multiple organs. TACIT is being used for multi-omics analysis of various organs from the same individual to look for common biomarkers.
The hope here, says Byrd, is to understand the pathways across the entire body that make patients healthy. That map of cells can then be coupled with AI and tested to learn if it could help predict and address the critical needs of individuals.
In another collaboration with Siddharth Sheth, D.O., a medical oncologist at the University of North Carolina (UNC) at Chapel Hill, Byrd and Liu conducted a pilot project using TACIT to stratify patients who did and did not respond to an inhibitor drug. The algorithm quickly found the signature of individual cells and groups of cells behaving like coordinated units—"flying in formation”—suggesting emergent properties of the tumor microenvironment.
In this case, the shape of the cell became an emergent new biomarker for predicting treatment response, says Byrd. Sheth now has additional trials underway using the multi-omics approach, including ones for subcutaneous skin cancer and others (with Merck) for head and neck cancer that involve different treatment arms.
“We are starting to be engaged with pharma on prospective trials at the time of the recruitment and to think about how we can validate some of these signatures in different cohorts to ensure the biomarkers we find are demonstrable to the responses we’re trying to predict,” he says.
The overall spatial biology field is still in the discovery phase and moving toward the translational stage, adds Byrd. “We are really pushing as hard as we can that way, but we must be careful about what the parameters are for standardization [of the employed technologies]. Each piece of equipment has to go through a rigorous process to be associated with a [prospective] clinical trial.”
Collaboration with industry leaders will be key to marrying up the tools with TACIT so there is an established way to package data and validate some of the targets and probes used to interrogate the spatial distribution of molecules within individual cells and tissues, he says. “This ecosystem of standardization, spanning instrumentation, reagents, and computational workflows, will be essential to achieving FDA [Food and Drug Administration] approval and bringing spatial diagnostics into regulated clinical use.”
The list of technology vendors has grown rapidly over the past five years, from perhaps four or five to 30 or more currently. TACIT can adapt to whatever system a biotech chooses to have at the translational clinical interface and can just as readily accommodate companies of any size that might be repurposing older technology, says Byrd. “It’s really a question of finding out who those right partners are, so they standardize, and [help] move the technology... towards the FDA for approval.”
TACIT itself will ultimately need FDA approval and must demonstrate, most importantly, that it can accurately and reliably discover what it is assigned, says Byrd. “We have pathologists on our team who verify the cell calls,” which will necessarily differ based on the organ and microenvironment under scrutiny.
‘Scratching the Surface’
The 51 cell types distinguishable in the latest study represent three niches (brain, intestine, and salivary gland) characterized by the Human Cell Atlas, an international initiative ongoing since 2016, says Byrd. The human body has an estimated 40 trillion cells, representing perhaps a couple thousand cell types that “define us as people,” which more than 3,000 scientists from over 100 countries are now actively working to label. Close to 150 million single cells have been profiled to date, of which thousands of cell types and cell states have been annotated, which is “still scratching the surface.”
As this comprehensive catalogue develops, TACIT will utilize all the relevant cell-level data to help sub-stratify patients with different disease subtypes based on the tumor microenvironment, Byrd continues. The algorithm could allow scientists to see at one time the features impacting tumor growth, metastasis, and response to therapy, e.g., when a cancer is becoming resistant to a drug, hiding from the immune system, or carrying viruses with them, or the body’s defense system has been exhausted, to inform next steps and precision treatment. “There are all kinds of ways for us to start leveraging already-approved FDA drugs as well as things in the pipeline to start marrying those to the signatures we see in the tissues.”
But in terms of the applicability of TACIT to the generalized human population, the model remains “incredibly underpowered... when it comes to considering biological sex, age differences, and different backgrounds and genetic ancestry,” says Byrd. “We need to ensure this next generation of AI tools is trained and validated on datasets that reflect real-world patient populations.”
Efforts in that direction are ongoing with partners at UNC, Duke University, and the University of Pennsylvania, as well as internationally in India and elsewhere, he adds. TACIT’s ability to link cell-type data together is one of its unique features and, much like Legos, enables the building of ever-larger structures. “We just don’t [yet] know... how good it would work with samples from a rare cancer subtype or a much older individual with a bunch of comorbidities who might have obesity, diabetes, and cancer, and may also have been a smoker but stopped 10 years ago.”
Starting Point
“Getting it right” will involve linking up the TACIT-identified cell types as much as possible to clinical metadata in electronic health records, Byrd says. The starting point will likely be looking at a widely applicable phenomenon such as drug toxicity before getting to an intervention trial that would involve digitizing a mountain of biopsies. “We’re getting really good at considering the influence of a person’s health history and... [determining] whether they should have a drug and at what dose [level].”
“The future of biomedical research and life science research in general will rely on a lot of multi-modality, high-dimensional data,” says Liu. Computational tools are critical to interpreting the outputs of generative models and linking micro-level information from cells to clinical outcomes, thereby realizing some of the promise of precision medicine.
“This is the most exciting time in biomedical research that I’ve ever had a chance to be part of,” enthuses Byrd. “Five years ago, when I was first putting my lab together, things that we thought might happen in 15 or 20 years are already happening.”
But building the knowledge base and getting patient samples into a digital format and through drug pipelines requires significant investment at the academic, industry, and AI data center levels. Federal dollars are “incredibly necessary” to support multi-partnership collaborations, as will be the interest of people everywhere in exploring what is possible with AI.
Leave a comment

