Recognize, Recall & Refine: Eliminating Bias in Computer Vision

Contributed Commentary by Ed Ikeguchi, AiCure

December 17, 2021 | More and more, we’re seeing AI’s ability to transform drug development. It plays an increasing role in evaluating how patients respond to treatment during a clinical trial and helping to extract meaningful insights that can improve drug development and bring drugs to market faster. But, the power of AI lies in the quality and quantity of its data backbone, so consistently evaluating that data to ensure it is representative of all patient populations is crucial to its success. We saw the pandemic expose longstanding issues of inequities in healthcare, emphasizing the imperative to help AI foster inclusivity and reach its greatest potential to improve the lives of patients worldwide.

However, when data is not representative of all skin colors or appearances, visual AI or computer vision that seems accurate in research can negatively impact people of different races, genders, and ages when applied in real-world scenarios. All players in the pharmaceutical industry are therefore responsible for thoroughly vetting the algorithms deployed to make sure they are strong enough to work outside of a controlled environment. We can do this by firstly, recognizing the consequences that can occur if this doesn’t happen; secondly, implementing protocols and incentives that can ensure data sets aren’t biased; and thirdly, encouraging refinement of insufficient algorithms to develop them to be more sound.

Long- And Short-term Consequences Of Poor AI

The consequences of using insufficiently governed and improperly trained AI in clinical trials can range from more trivial impacts on research operations to more detrimental impacts on patient outcomes, depending on where and how it’s deployed.

On the less harmful end of the spectrum, a non-optimized AI system helping to identify optimal investigator sites might lead to a slight skew in the quality of sites chosen to participate. A more concerning challenge is that faulty AI used in a decentralized trial to gauge patient engagement could result in an inordinate number of early terminations or an overall inaccurate assessment of patient enrollment rates, which could lead a study to fail. Further, if visual AI used through computer vision technologies in a clinical trial doesn’t work on patients with darker skin, there could be serious long-term consequences when that drug is deployed in real world scenarios—those with darker skin taking a drug that wasn’t vetted in the same way it was for white patients and could lead to adverse outcomes for the minority population.

Not only for efficient operations, but for the sake of ethics and sound drug development, we need to establish processes that safeguard the quality of our datasets and confirm our data is representative of the broader population.

Implementing Protocols To Detect Bias

Today’s AI developers still consistently lack access to large, diverse data sets, often training algorithms on small, single-origin data samples with limited diversity. In today’s environment, once that AI solution is approved from a regulatory perspective, there are limited protocols in place to assess how it performs in the real world. Consider this: when a new drug is approved and given to thousands of patients outside of a clinical trial, it’s common for unexpected side effects to arise that didn’t occur during the research phase. Just like there’s a process to recall and reassess that drug, there should be a similar checks and balances protocol for AI in which we can detect inaccuracies in real-world scenarios, such as revealing when computer vision doesn’t work for certain skin colors or other biases.

Algorithms are only as good as the hypotheses driving them and require systematic re-evaluation from time to time as you move from a controlled research environment to real-world populations. Particularly, test data and training data utilized in building AI algorithms need to be constantly evaluated to make sure they represent real-world patient populations where the AI is to be applied. Bias can be caught during the preemptive planning stage, for example. Depending on the use case, AI developers should always be testing in line with regulatory standards and be careful of assuming that off-the-shelf open-source software fits every population and disease state. Independent audits, both internal and external, are an option. But the customer or end-user ultimately must assess the AI tools suitability for their needs.

A review process for algorithms should be implemented to catch unexpected results that might occur, especially since algorithms should be consistently evolving and learning as they’re fed more and more data. As an industry, we need to become more skeptical of AI’s conclusions and encourage transparency. We should be consistently asking ourselves and others: How was the algorithm trained? On what basis did it draw this conclusion? Upon noticing that algorithms aren’t working properly across the entire population, companies should enact protocols so employees are both aware and encouraged to re-develop algorithms and better represent all populations. To further validate computer vision algorithms, they can do this by including people with a variety of backgrounds and skin tones during development, and encouraging participants to wear hats, sunglasses, or different types of clothing and record themselves under varying lighting conditions, so that the AI focuses on the individual person regardless of their appearance. This will lead to stronger algorithms and more accurate, fairer outcomes. Only once we interrogate and constantly evaluate an algorithm under both common and rare scenarios with varied populations will it be ready for introduction into real-world situations.

Recognition Is The First Step Toward Progress

In many ways, there still lacks fundamental recognition that different complexions and appearances need to be incorporated into algorithms for the tech to work effectively, and raising awareness is the first step. The use of AI is soaring, but as the pandemic has taught us, the impact of innovative technologies is limited if they don’t prioritize fairness and equality. We must work to ensure the technology our patients and pharmaceutical companies use has the foundations it needs to foster equality and reach its potential.

 

Edward F. Ikeguchi, M.D. is the Chief Executive Officer at AiCure. Prior to joining AiCure, he was previously a co-founder and Chief Medical Officer at Medidata for nearly a decade, where he also served on their board of directors. Dr. Ikeguchi served as assistant professor of clinical urology at Columbia University, where he has experience using healthcare technology solutions as a clinical investigator in numerous trials sponsored by both the commercial industry and the National Institutes of Health. Dr. Ikeguchi holds a B.S. in chemistry from Fordham University and a M.D. from Columbia University's College of Physicians & Surgeons, where he also completed his surgical internship, subspecialty training and fellowship. He can be reached at ed.ikeguchi@aicure.com