From Complexity to Clarity: ChatGPT's Potential to Increase Health Literacy

LLMs can provide accurate clinical information on lung and breast cancer screening patient education materials

Paul Yi, MD
Jean Jeudy, MD
Hana Haver, MD

Among U.S. patients with low health literacy, cancer screening rates are lower than the national average. It is thought that low health literacy creates barriers to screening, such as not understanding exams and why they are important.

With the advent of ChatGPT in late 2022, researchers began studying the tool’s capability to simplify cancer screening and prevention information, making it more decipherable to patients and increasing the likelihood of adherence to screening programs.

Paul H. Yi, MD, assistant professor of diagnostic radiology and nuclear medicine at the University of Maryland School of Medicine (UMSOM), Baltimore, explained the links between health literacy and adherence to screening, and why current patient education materials create a barrier to improving health literacy.

“In the case of cancer, increased health literacy is associated with greater adherence to things like breast cancer screening. If patients are better informed, they’re more likely to make good health decisions,” he said. “The problem is that health educational materials are notoriously hard to read. They are written often at the 12th grade level or higher, making it hard for some patients to understand them.”

For lung cancer in particular, screening rates are abysmally low, said Jean Jeudy, MD, professor of diagnostic radiology and nuclear medicine at UMSOM. Dr. Jeudy hopes that with the help of large language models (LLMs) like ChatGPT, eligible patients can understand the importance of lung cancer screening and better adhere to screening programs, even if they face socioeconomic challenges that would prevent them getting to a screening.

“In some reports, only 5%-8% of the population eligible for lung cancer screening are getting screened. There are a lot of medical terms within patient education materials that patients may not understand, so we want to see if we can use LLMs to simplify the message and bring it to a broader audience,” he said. “Hopefully with a clearer understanding, patients can improve their overall decision making and increase the rates of screening that we’re seeing globally."

Direct Your Patients to for Patient-Friendly Information

Introduced in 2003 and co-produced by RSNA and the American College of Radiology (ACR), is home to patient-friendly information about more than 270 imaging procedures, exams and disease topics.

New additions to the site include videos on how to read a radiology report. Videos on reading abdominal/ pelvic CT and chest X-ray reports are available now. In addition, among the site’s most popular articles are those related to commonly performed procedures like abdominal and cardiac CT, head CT and CT angiography.

All content is also available in Spanish with 30 videos translated into Spanish. Content is reviewed regularly through a committee comprised of 16 multiinstitutional radiology professionals, with approximately 85 medical advisors who assist each year with writing and content review.

When your patients have questions and need more information, refer them to

Accessible and Accurate Health Information in a Few Taps

Drs. Yi and Jeudy were part of research teams, including first author Hana Haver, MD, breast imaging fellow at Massachusetts General Hospital and former radiology resident at the University of Maryland, that investigated ChatGPT’s ability to simplify screening and prevention information on lung and breast cancers. In the lung cancer study, the researchers formulated 25 common patient questions about lung cancer screening, and then entered them into ChatGPT on three separate occasions. They categorized the responses as appropriate, inappropriate or inconsistent. ChatGPT produced responses that were appropriate 84% of the time.

“Because of the nature of ChatGPT,it can give you different responses at different times. So, we wanted to verify its fidelity to its responses,” Dr. Jeudy explained. “We found that 21 out of 25 of those questions always elicited appropriate responses. There were two which generated inappropriate responses, and two which we thought were inconsistent.”

The breast cancer study, also first-authored by Dr. Haver, was a follow-up to a previous study, similar to the lung cancer study—asking ChatGPT for answers to 25 commonly asked patient questions about breast cancer and screening. In the follow-up, the researchers asked ChatGPT to lower the reading level of each of the 25 responses to a sixth-grade level. They repeated this prompt three times for each response, to see if the responses could be further refined.

“The responses in the original study, although 88%-90% accurate and appropriate, were written on average at grade 13 level, or first year in college, while the average U.S. adult reads at a sixth grade level,” Dr. Yi said. “When we asked ChatGPT to rewrite these responses at a sixth-grade reading level, there were statistically significant decreases to about eighth or ninth grade level, and 92% of them were clinically appropriate.”

Dr. Yi added that the response to a question about breast density, often a confusing topic for patients, was successfully simplified to a ninth-grade level.

“The original response for that particular question was approximately college level understanding. When it was rewritten, it went down to about the ninth-grade level,” Dr. Yi said. “I think this is notable because breast density is a potentially challenging and confusing topic. It’s encouraging that the same improvements that we saw overall held for this important topic.”

Physician Oversight Needed in Patient Use of Large Language Models

Drs. Haver, Yi and Jeudy agree that the technology is not yet capable of simplifying cancer prevention and screening information for patients without physician oversight.

Dr. Yi reminded that “hallucinations.” a common defect of LLMs, can result in an LLM providing incorrect information.

“In our research, there’s still a 10% inaccuracy rate in this model’s responses, so I would submit that a 10% margin of error is not good enough to allow LLMs to run autonomously,” Dr. Yi said. “Any amount of misinformation can have significant consequences for patients.” Dr. Yi also explained that the ability of any LLM to provide correct information depends heavily on its data training set.

Given that health guidelines change, an LLM that hasn’t updated its training set may provide outdated information.

“LLMs are limited by the training data used, meaning the model may have only learned or been exposed to data up to a certain date. Because health care information changes, there’s a chance that an LLM may have outdated information,” Dr. Yi said.

Dr. Jeudy emphasized that regardless of the increasing ability of a LLM to provide clinically accurate, up-to-date information, it cannot synthesize that information for use in individual patient cases.

“Even when correct information is given to patients, they still need physicians to personalize the information according to their medical history and to address their concerns,” he said. “Although LLMs can accentuate medical information for patients, physicians are still going to be needed to tailor the information to individual cases.” 

For More Information

Access the Radiology: Cardiothoracic Imaging study, “Evaluating ChatGPT’s Accuracy in Lung Cancer Prevention and Screening Recommendations.”

Access the Radiology: Imaging Cancer study, “Evaluating the Use of ChatGPT to Accurately Simplify Patient-centered Information about Breast Cancer Prevention and Screening.”

Read previous RSNA News articles on uses for ChatGPT: