The Good, the Bad and the Ugly of Using ChatGPT

How radiologists can harness the power of conversational AI programs to assist with specific tasks


Som Biswas, MD
Biswas
Yiqui Shen, MS
Shen
Linda Moy 2022
Moy
Felipe C. Kitamura, MD, PhD
Kitamura

The future of medical writing has already been drafted and its author is AI. At least that’s the opinion of Som Biswas, MD, a radiologist at the University of Tennessee Health Science Center in Memphis. 

“With AI having come to a level at par with human authors, the use of chatbots and natural language processing have the potential to revolutionize the field of medical writing,” he said.

To prove his point, he published an article in Radiology that was entirely written by ChatGPT, the new AI model developed by Open AI. Dr. Biswas was hoping to prove by example the extent to which AI has developed and how it can be used by researchers.

“The use of these technologies in medical writing can greatly improve the speed and accuracy of document creation,” Dr. Biswas wrote in his article or, in this case, the chatbot wrote.Dr. Biswas noted that he did edit the draft created by ChatGPT. But while ChatGPT is good–even great–at some tasks, it does have its limitations, making the technology a double-edged sword.

“As we navigate the decisions about using AI-based technologies like ChatGPT, it is essential that we proceed with caution,” according to Yiqiu Shen, MS, a researcher at New York University’s Center for Data Science, who co-authored a Radiology editorial about the technology. “Harnessing this technology does hold potential, but can result in some challenges for physicians, health care systems, patients and others working with medical text.”

ChatGPT And Radiology Reporting

Given the proper prompts from the human writer, ChatGPT does a good job at automating many writing-related tasks, including writing radiology reports, according to Dr. Biswas.

For example, once a radiologist finds a diagnosis, such as meningioma in the right frontal lobe, they can enter the following command into ChatGPT: Create MRI report on right frontal lobe meningioma. The program will create an entire report that only needs to be verified, edited and signed off. Based on the above prompt, ChatGPT provided Dr. Biswas with the following report:

Findings: The examination demonstrates a well-circumscribed, heterogenerously enhancing mass in the right frontal lobe, measuring {measurement}. The mass demonstrates intermediate signal intensity on T1-weighted images and high signal intensity on T2-weighted and fluid-attenuated inversion recovery (FLAIR) images, consistent with a meningioma. The mass demonstrates homogeneous contrast enhancement. No significant mass effect or surround edema is identified. No intralesional flow voids are identified to suggest a more aggressive neoplasm.

“Some words in this report were not correct,” Dr. Biswas said. “However, that can be edited by the human radiologist. The majority of the report was good enough to be used as a draft that a radiologist could edit to significantly reduce reporting time.”

“In general, it’s ok to use ChatGPT as a language aid or to provide a template, but it’s dangerous to rely on ChatGPT to make a clinical decision.”

YIQIU SHEN, MS

Beyond radiology reports, ChatGPT can assist with a number of other medical writing-related tasks. It has already been used to successfully draft preauthorization letters to insurance providers and for producing patient-facing materials such as post-procedure care instructions. It can also be used for protocoling imaging studies, summarizing clinical information and translating information into different languages.

“In general, it’s ok to use ChatGPT as a language aid or to provide a template, but it’s dangerous to rely on ChatGPT to make a clinical decision,” Shen said. “To make this point clear, it would be hazardous to give commands like, ‘given the lab test result of the patient, please generate a diagnosis and write a report about it.’ It would be more appropriate to give commands like, ‘This is my draft of diagnosis. Please proofread and reformat it so that the report is compatible with xxx format.’”

What Could ChatGPT Get Wrong?

While ChatGPT may be good at some writing tasks, it is just as important, if not more so, to keep in mind what it cannot do, Dr. Biswas noted. For starters, while ChatGPT can leverage vast knowledge bases to quickly answer questions regarding the best imaging study in specific clinical scenarios, it can’t do so on its own.

“Because ChatGPT can only follow the instructions it’s given, it must first be given carefully crafted prompts,” Dr. Biswas said. If the information provided is insufficient, ChatGPT tends to make assumptions about what the user wants—assumptions that can cause problems when providing guidance on, for example, imaging appropriateness. Furthermore, if the prompts don’t include specific information about a clinical scenario or potential contradictions, the chatbot might come back with a seemingly credible but ultimately incorrect response. Shen said this is the ‘hallucination effect,’ a process where an AI system generates concepts and phrases them in a convincing manner that can fool non-experts.

“For instance, if ChatGPT is uncertain about the correct diagnosis of a patient, it may use a factual tone to assert a diagnosis and provide fabricated references to support its conclusion. This could be concerning, as medical reports or diagnoses generated by ChatGPT without proper expert review could lead to misdiagnosis and harm to patients,” Shen said.

Linda Moy, MD, editor of Radiology and professor of radiology at the NYU Grossman School of Medicine, noted that the fabrication of medical research is also a big issue in medical journalism.

“The COVID-19 pandemic has sowed seeds of mistrust in science and health,” Dr. Moy said. “To combat this, careful peer review and use of currently available software should be used to check for plagiarism and to catch any artificial abstracts or fabricated articles.”

Another example of the hallucination effect is when ChatGPT uses a confident tone to describe something that is actually uncertain, such as confidently describing a finding in a medical report as being benign when it actually appears to be cancerous.

“In my opinion, the most important thing physicians can do to proactively address these limitations is to only use ChatGPT to write about topics one is knowledgeable about,” said Felipe C. Kitamura, MD, PhD, head of applied innovation and AI at Dasa Hospital in São Paulo, who also used ChatGPT to write an editorial for Radiology. “Only by doing that can we guarantee that the generated text will be critically reviewed.”

Dr. Moy noted that several leading medical journals, such as JAMA, Science and Nature, have written editorials about publication policies on ChatGPT. All of them state that ChatGPT is not an author.

The Future of Using AI-Powered Language Programs in Medicine

Beyond the issues of what ChatGPT can’t do, there are also questions about what it should or shouldn’t be allowed to do.

“ChatGPT has raised a range of questions about ethics, including concerns about authorship and accountability for the content being generated,” Dr. Biswas said.

Because the content being generated is not original but based on already written information that the chatbot finds in books, documents, the internet and other sources, its use raises issues of copyright infringement.

“There is a non-negligible chance that parts of sentences generated by ChatGPT match exact phrases in the training data,” Dr. Kitamura said. “Although there is no report of copyright infringement so far, it is reasonable to check for plagiarism in these early days of large language models.”

Furthermore, as a chatbot’s answers are contingent on the quality and type of data used to train it. ChatGPT was trained on data published before 2021. This can be concerning as it could have incorrect ICD-10 and CPT code, which may have been updated. In addition, there is the risk of biases, inaccuracies or misconceptions in the text.

“Users should be aware of the potential for biases, inaccuracies and misconceptions in the data,” Shen added. “ChatGPT may be more likely to generate text that confirms existing beliefs or biases, rather than challenging them or presenting alternative perspectives. When ChatGPT is used as a second checker in a diagnosis, it’s less likely to challenge a clinician’s opinion.”

Despite its challenges and risks, if used effectively, Dr. Biswas said he believes that ChatGPT’s potential could outweigh its current limitations.

“This technology can significantly reduce the back-breaking burden we now face due to ever-increasing imaging and report requests and a nationwide shortage of radiologists,” Dr. Biswas concluded. “It is exciting to consider the possibilities of what ChatGPT and similar AI-based technologies have in store for radiology in the future.” 

For More Information

Access the Radiology commentaries:

• “ChatGPT and Other Large Language Models Are Double-edged Swords

• “ChatGPT and the Future of Medical Writing

• “ChatGPT Is Shaping the Future of Medical Writing but Still Requires Human Judgment” 

Listen to the Radiology podcast: ChatGPT- Special Radiology:AI Podcast Collaboration

Credit: Tatiana Shepeleva/Shutterstock.com