Enhancing LLM Performance with Retrieval-Augmented Generation

Despite a few challenges, providing access to specific, relevant, updated data helps eliminate problems inherent in LLM models


Dane Weinert, MD
Weinert
Judy Gichoya, MD, MS
Gichoya

Large language models (LLM) are at the heart of many AI programs. Trained on vast amounts of data, these models are designed to perform a wide range of tasks, including summarizing text and answering queries.

Yet although LLMs can do a lot, they have their limitations.

“LLMs only know as much as they are taught. For a knowledge domain like radiology, which is specialized, has a constantly growing knowledge base, and requires a high degree of accuracy, this knowledge gap can cause problems,” said Dane Weinert, MD, a radiology resident at the University of Southern California Keck School of Medicine in Los Angeles.

One of those problems is hallucinations. “While LLMs are good at producing fluent text, they struggle with understanding and retaining factual information,” said Judy W. Gichoya, MD, MS, associate professor in the Department of Radiology and Imaging Sciences at Emory University School of Medicine in Atlanta. “This can cause them to hallucinate, where they fill gaps in knowledge with information that sounds plausible but is actually false.”

To solve this problem, one could consider using retrieval-augmented generation, also known as RAG.

A Cheat Sheet for LLMs

RAG is a method for providing an LLM with information it can use to supplement and inform its answer. “A RAG is like a cheat sheet in that it gives the LLM the specific knowledge it should focus on and, in doing so, helps steer the LLM away from making up answers,” Dr. Gichoya explained.

RAG relies on a vector database of desired information that can be accessed whenever an LLM is queried. A vector database is a database that contains data in a format that AI algorithms can understand.

“The user’s query is transformed into a vector representation and is then matched to text in the vector database that closely matches its vector representation,” Dr. Weinert explained. “This text is then provided to the LLM in the user’s prompt and can be seen by the LLM prior to answer generation.”   

In theory RAG sounds promising, but does it actually work?

That’s the question Dr. Weinert sought to answer in a recent Radiology: Artificial Intelligence study. “Our goal was to evaluate whether popular open- and closed-source LLMs could benefit from RAG on a multiple-choice test of radiology knowledge,” he said.

The study developed a vector database of approximately 3,600 RadioGraphics articles published between 1999 and 2023. Five popular LLMs were tested both with and without the benefit of RAG on a multiple-choice test.

The test was based on the publicly available 2014 American Board of Radiology CORE Examination Study Guide and the American College of Radiology’s Diagnostic Radiology In-Training (DXIT) Examinations practice tests from 2020, 2021 and 2022. The test also included a subgroup of questions sourced directly from the RadioGraphics articles.

ChatGPT

A Clear Potential to Enhance LLM Performance

With this test in hand, researchers were able to compare the inherent knowledge of an LLM against its potential RAG-enhanced performance.

They found that RAG significantly improves the performance of the GPT4 and Command R+ models but has little if any impact on the Claude 3 Opus, Mixtral 8x7B and Gemini 1.5 Pro LLMs.

As to the subgroup of questions sourced directly from RadioGraphics, the RAG-enhanced systems outperformed standalone LLMs, successfully retrieving 21 of 24 relevant references cited in the answer explanations and accurately citing them in 18 of 21 outputs.

According to Dr. Weinert, these findings confirm that a thoughtfully constructed vector database can enhance an LLM. “If one wanted to create a super-specialized LLM assistant that had state-of-the-art knowledge of oncologic imaging and related medical fields, one might craft a vector database of all the relevant journal articles that would inform that LLM,” he said.

This database could then be dynamically updated with new articles and guidelines in real-time. “This removes the need for time-consuming, costly and complex fine tuning,” Dr. Weinert added. “It also allows the LLM to pull from a knowledge base that is more up-to-date and niche than what is available either via a simple web search or what is built into the inherent knowledge of a standard model.”

A RAG is Only as Good as its Database

RAG has a core restriction—it can only retrieve information that exists in its own database. “A RAG system is only as good as the knowledge library,” Dr. Gichoya said. “If that library overrepresents certain topics and underrepresents others, the system’s performance will reflect that bias.”

This is why Dr. Weinert would like to see radiology journals provide application programming interface (API) access to clinician-developers. “Doing so would allow these journals’ rich corpus of information to be used in RAG-like applications and would allow the journal to take on a new ‘information as a service’ model, one that would further elevate their role as critical providers of a much-needed resource,” he said.  

There’s also the question of who gets to curate the database. “RadioGraphics is very clean and safe data, but real-world data is messy, so you really have to think about the source of the data being used,” Dr. Gichoya emphasized. 

“The potential of LLMs is huge, but that potential will only be met when we figure out how to use them safely, and that takes time,” she said. “RAG provides a way forward for using LLM in a more controlled domain while we work toward something bigger.”

—JUDY W. GICHOYA, MD, MS

A Temporary Solution

Beyond the challenge of curation, there’s also the risk that RAG could curtail an LLM’s performance. “The power of LLM is in its broadness—that it can learn from an infinite number of resources,” Dr. Gichoya noted. “If you restrict it, as RAG does, you risk missing out on the opportunity that is LLM.”  

This is why Dr. Gichoya sees RAG as a temporary solution—a way to mitigate hallucination, build trust in what is being generated, and buy time while we navigate these nascent days of AI. “The potential of LLMs is huge, but that potential will only be met when we figure out how to use them safely, and that takes time,” she said. “RAG provides a way forward for using LLM in a more controlled domain while we work toward something bigger.”  

An Additional Layer of Security Against Privacy-Related Risks

Learn how RAG can help safeguard patient privacy and strengthen health care AI security against emerging threats.