Deepfake X-Rays Fool Radiologists and AI

Findings raise concerns about cybersecurity and diagnostic trust


Mickael Tordjman, MD
Tordjman

Neither radiologists nor multimodal large language models (LLMs) are able to easily distinguish AI-generated “deepfake” X-ray images from authentic ones, according to a study published in Radiology. The findings highlight the potential risks associated with AI-generated X-ray images, along with the need for tools and training to protect the integrity of medical images and prepare health care professionals to detect deepfakes.

The term “deepfake” refers to a video, photo, image or audio recording that appears real but has been created or manipulated using AI.

“Our study demonstrates that these deepfake X-rays are realistic enough to deceive radiologists, the most highly trained medical image specialists, even when they were aware that AI-generated images were present,” said lead study author Mickael Tordjman, MD, post-doctoral fellow, Icahn School of Medicine at Mount Sinai, New York. “This creates a high-stakes vulnerability for fraudulent litigation if, for example, a fabricated fracture could be indistinguishable from a real one. There is also a significant cybersecurity risk if hackers were to gain access to a hospital’s network and inject synthetic images to manipulate patient diagnoses or cause widespread clinical chaos by undermining the fundamental reliability of the digital medical record.”

Seventeen radiologists from 12 different centers in six countries (United States, France, Germany, Turkey, United Kingdom and United Arab Emirates) participated in the retrospective study. Their professional experience ranged from 0 to 40 years. Half of the 264 X-ray images in the study were authentic, and the other half were generated by AI. Radiologists were evaluated on two distinct image sets, with no overlapping between the datasets. The first dataset included real and ChatGPT-generated images of multiple anatomical regions. The second dataset included chest X-ray images—half authentic and the other half created by RoentGen, an open-source generative AI diffusion model developed by Stanford Medicine researchers.

When radiologist readers were unaware of the study’s true purpose, yet asked after ranking the technical quality of each ChatGPT image if they noticed anything unusual, only 41% spontaneously identified AI-generated images. After being informed that the dataset contained synthetic images, the radiologists’ mean accuracy in differentiating the real and synthetic X-rays was 75%.

Individual radiologist performance in accurately detecting the ChatGPT-generated images ranged from 58% to 92%. Similarly, the accuracy of four multimodal LLMs—GPT-4o (OpenAI), GPT-5 (OpenAI), Gemini 2.5 Pro (Google), and Llama 4 Maverick (Meta)—ranged from 57% to 85%. Even ChatGPT-4o, the model used to create the deepfakes, was unable to accurately detect all of them, though it identified the most by a considerable margin compared to Google and Meta LLMs.

Radiologist accuracy in detecting the RoentGen synthetic chest X-Rays ranged from 62% to 78% and the LLM models’ performance ranged from 52% to 89%.

There was no correlation between a radiologist’s years of experience and their accuracy in detecting synthetic X-ray images. However, musculoskeletal radiologists demonstrated significantly higher accuracy than other radiology subspecialists.

Anatomy-matched real and GPT-4o-generated chest, hand, cervical spine and lumbar spine radiographs.

Anatomy-matched real and GPT-4o-generated radiographs: (A) real and (B) GPT-4o-generated posteroanterior chest radiographs, (C) real and (D) GPT-4ogenerated lateral cervical spine radiographs, (E) real and (F) GPT-4o-generated posteroanterior hand radiographs, and (G) real and (H) GPT-4o-generated lateral lumbar spine radiographs. The pairs demonstrate that GPT-4o can produce radiographically plausible images across different anatomic regions.

https://doi.org/10.1148/radiol.252094 ©RSNA 2026

Spotting the Risks in Synthetic Imaging

"Deepfake medical images often look too perfect,” Dr. Tordjman said. “Bones are overly smooth, spines unnaturally straight, lungs overly symmetrical, blood vessel patterns excessively uniform, and fractures appear unusually clean and consistent, often limited to one side of the bone."

Recommended solutions to clearly distinguish real and fake images and help prevent tampering include implementing advanced digital safeguards, such as invisible watermarks that embed ownership or identity data directly into the images and automatically attaching technologist-linked cryptographic signatures when the images are captured.

“We are potentially only seeing the tip of the iceberg,” Dr. Tordjman said. “The logical next step in this evolution is AI-generation of synthetic 3D images, such as CT and MRI. Establishing educational datasets and detection tools now is critical.”

The study’s authors have published a curated deepfake dataset with interactive quizzes for educational purposes.

For More Information

Access the Radiology study, “The Rise of Deepfake Medical Imaging: Radiologists’ Diagnostic Accuracy in Detecting ChatGPT-generated Radiographs,” and the related editorial, “The Democratization of Deceit: Seeing Is No Longer Believing.”

Read previous RSNA stories about AI in medical imaging: