Addressing Bias in Imaging AI to Improve Patient Equity
R&E Foundation grant project shows how synthetic data can help improve medical imaging AI


While AI holds immense potential to revolutionize health care and radiology, significant barriers still stand in the way of widespread adoption.
These can include lower performance when the model encounters unfamiliar data, and uneven results across different groups, with bias rarely assessed in both current published research and FDA-approved studies.
AI metrics are also challenging due to the lack of universal agreement on what constitutes a biased model. Monitoring and evaluating for bias are challenging because they require diverse datasets and may not be feasible for models with commercial origins due to their “black-box” nature.
In a 2023 R&E Foundation Emerging Issues Grant, Judy W. Gichoya, MBChB, MS, associate professor in the Department of Radiology and Imaging Sciences at Emory University School of Medicine and co-director of the Healthcare AI Innovation and Translational Informatics (HITI) Lab, and colleagues sought to address bias in AI.
First, the researchers conducted external validation for three image-based prediction algorithms for breast cancer, knee osteoarthritis and atherosclerotic disease risk from chest X-rays. Then, they aimed to develop and apply bias-detection techniques to these models to understand their performance during external validation on a diverse dataset.
“We want to make sure that AI works for everyone and this is because AI does not know how to say, ‘I do not know,’” Dr. Gichoya said. “Every time, it's going to make a prediction and that's why we must really understand where AI fails, and especially if we see a systematic failure in a specific subset of the population. That's where bias comes in.”
“We want to make sure that AI works for everyone and this is because AI does not know how to say, ‘I do not know.’ Every time, it's going to make a prediction and that's why we must really understand where AI fails, and especially if we see a systematic failure in a specific subset of the population. That's where bias comes in.”
—JUDY W. GICHOYA, MBCHB, MS
Improving Model Fairness Through Synthetic Data Generation
To reduce the burden of manually sorting and labeling thousands of medical images, the researchers used a denoising diffusion probabilistic model (DDPM). DDPMs are generative models that work iteratively by adding noise to an input signal and then learning to reverse this process—denoising—to generate new samples.
This AI tool was trained using a large chest X-ray dataset called CheXpert, then tested on two additional datasets to make sure it performed well across different populations.
Before training, all images were standardized to the same size and lighting conditions. The model learned to generate realistic-looking chest X-rays based on key patient characteristics like age, sex, race and disease status.
To better understand how the model’s settings affected performance, researchers created three sets of synthetic X-rays using different levels of guidance, which control how closely the generated images match the specified patient characteristics.
In the second part of the study, the team tested whether machine learning models trained on real or synthetic data could fairly and accurately detect diseases like cardiomegaly or pneumothorax.
They used several advanced techniques designed to reduce bias, especially related to patient demographics such as race. These included methods that removed demographic data from the training process, re-weighted data from underrepresented groups and improved general performance.
They also used transfer learning, a machine learning technique where a model developed for one task is reused as the starting point for a different, but related task to see if any of the models were unintentionally learning and relying on sensitive demographic details.
To evaluate performance, the team looked at accuracy, error rates and how fairly each model worked across different patient groups. They also tested how well the models performed when applied to five new, external datasets, measuring fairness under real-world conditions. Comparing each method to an ideal model, called an “oracle”, they assessed which approaches minimized bias most effectively.
“We found that AI is very good at learning the characteristics of the underlying dataset. So, then we could come back and ask it, ‘Generate a dataset of only Black patients?’ ‘Generate only patients with cardiomegaly?’ Or ‘Generate patients with a support device?’” Dr. Gichoya said. “We were able to successfully show that AI generated synthetic data is a tool for people to use to thoroughly evaluate a model. Now, a new model can be created to generate subsets of clinical data sets that make sense to the team and the research, especially when there may be a subgroup not performing well. This is done without the burden of curating datasets for each evaluation.”
Synthetic Data Boosts Accuracy and Generalizability
Models trained on this synthetic data performed comparably to models trained on real images. Their area under the receiver operating characteristic (AUROC) curve scores trailed the real data alone but improved significantly with the addition of the synthetic images to real datasets.
Supplementing training sets with synthetic chest X-rays led to statistically significant improvements in model performance across internal and external test sets (e.g., CheXpert, MIMIC-CXR and Emory). These gains were particularly notable in low-prevalence pathologies.
Prior research has shown that synthetic data can have limitations, such as lacking the full complexity and variability of real-world images. Models trained solely on synthetic data may miss subtle clinical nuances. For Dr. Gichoya and her team, using both synthetic and real data yielded better results than synthetic data alone, highlighting synthetic images as a valuable supplement—especially for rare findings and cross-institution generalizability.
“Our study shows that deep learning models trained on synthetic medical images can match the performance of those trained on real data—and that supplementing real datasets with synthetic ones boosts both accuracy and generalizability,” Dr. Gichoya said. “We also highlight how synthetic data can be strategically used to evaluate and mitigate model bias, and how models that encode less demographic information tend to perform more fairly across new clinical settings. These findings offer a practical framework for building more robust and equitable AI systems in medical imaging.”
R&E Grant Provides Resources to Build, Validate Ethical AI
The R&E Foundation grant provided the opportunity and time to develop, test and validate the models.
“Applying for the R&E Foundation grant was helpful, and we have not only benefited as a group, but also in the research community and the industry,” Dr. Gichoya said. “AI is a really important area and it's clear that the regulatory processes cannot currently do a check on everything. As we start to move to more complex models, like multimodal models, the AI basics are going to be very necessary. The work that was done because of this grant is going to help move the field of preventing AI bias forward.”
For More Information
Learn more about R&E Foundation funding opportunities.
Read previous RSNA News stories on AI bias: