Can ChatGPT be an Effective Patient Communication Tool in Radiology?

October 20, 2023

News

Article

While ChatGPT has the potential to help streamline responses to imaging-related questions from patients, the authors of a new study found that a third of ChatGPT responses to unprompted questions on medical imaging were not “fully relevant.”

Can ChatGPT adequately address common questions from patients about medical imaging?

With this question in mind, researchers recently examined the viability of the generative pre-trained language model (GPLM) to provide accurate and relevant answers to 22 imaging questions related to safety, imaging procedures, terminology, the radiology report, and other topics identified as being important to patients.

Asking the questions three times in order to assess consistency with the ChatGPT (version 3.5, OpenAI) responses, the researchers examined the use of no additional question prompt and the addition of a modifying prompt to emphasize accuracy and readability for the average person, according to the study, which was recently published in the Journal of the American College of Radiology.

The researchers found no significant difference in accuracy between unprompted ChatGPT (version 3.5) responses (82.6 percent) and responses to an additional modifying prompt (86.7 percent). For no-prompt responses, the study authors noted consistency of responses 71.6 percent of the time, but this percentage increased to 86.4 percent with responses to modifying prompts.

“Automating the development of patient health educational materials and providing on-demand access to medical questions holds great promise to improve patient access to health information,” wrote study co-author Alessandro Furlan, M.D., an associate professor of radiology and chief of the Abdominal Imaging Section at the University of Pittsburgh Medical Center (UPMC), and colleagues.

Can ChatGPT Be an Effective Patient Communication Tool in Radiology?

Researchers noted that complete relevance only occurred with 66.7 percent of unprompted ChatGPT responses and 79.6 percent of responses to modifying prompts. The study authors acknowledged that the lowest percentages for full relevance were seen with safety-related questions. Only 50 percent of the no prompt responses to safety questions were deemed fully relevant with an increase to 64.6 percent when there was a modifying prompt.

While the researchers found that 98.5 percent of unprompted responses and 98.9 percent of responses to modifying prompts were at least partially relevant to the posed questions, they noted that complete relevance only occurred with 66.7 percent of unprompted ChatGPT responses and 79.6 percent of responses to modifying prompts.

The researchers acknowledged that the lowest percentages for full relevance were seen with safety-related questions such as “What are the risks of MRI during pregnancy?” and “Do X-rays or CT scans cause cancer?” Only 50 percent of the no prompt responses to safety questions were deemed fully relevant with an increase to 64.6 percent when there was a modifying prompt.

“While the accuracy, consistency, and relevance of the ChatGPT responses to imaging-related questions are impressive for a GPLM, they are imperfect,” noted Furlan and colleagues. “By clinical standards, the frequency of inaccurate statements that we observed precludes its use without careful human supervision or review.”

Three Key Takeaways

ChatGPT's accuracy and consistency. Researchers found that ChatGPT (version 3.5) provided over 80 percent accuracy with responses to medical imaging questions and had a 71.6 percent consistency rate for unprompted responses and 86.4 percent with modifying prompts.
Relevance of responses. While most responses were at least partially relevant to the questions, complete relevance was lower, with 66.7 percent of unprompted responses and 79.6 percent of prompted responses considered fully relevant. Safety-related questions had the lowest full relevance percentages.
Readability issues. The readability of ChatGPT responses was a concern, as none of the responses were at or below an eighth-grade reading level. The high complexity of responses could hinder patient access to health information.

(Editor’s note: For related content, see “Can ChatGPT Pass a Radiology Board Exam?,” “Can ChatGPT Provide Appropriate Information on Mammography and Other Breast Cancer Screening Topics?” and “Can ChatGPT Have an Impact in Radiology?”)

Utilizing the Flesch-Kincaid readability testing, the researchers noted no significant differences between prompted and unprompted ChatGPT responses with respect to reading level. The study authors pointed out that none of the responses were at or below an eighth grade reading level. They added that only 30 percent of unprompted ChatGPT responses and 41 percent of prompted responses were below a 12th-grade reading level.

“The ability to understand health information presented to patients is crucial for their capacity to make informed medical decisions,” maintained Furlan and colleagues. “As it currently stands, the high complexity of the responses clouds the promise of true patient access to health information.”

In regard to study limitations, the authors conceded that the rapidly evolving technology with ChatGPT is likely to impact the effectiveness of the platform in answering common patient questions on medical imaging. Pointing out that the questions used to assess ChatGPT responses were written by radiologists, the researchers noted these questions may not reflect the variability with which patients may ask similarly themed questions. The study authors also noted a lack of clarity as to how ChatGPT would handle questions posed in other languages than English.

Related Content

FDA Clears Emerging 3D Ultrasound Technology

Jeff Hall

July 2nd 2025

Article

The diagnostic tomographic 3D ultrasound imaging technology PIUR tUS inside will reportedly be available with select GE HealthCare Logiq systems.

The Reading Room Podcast: Current Insights on Recent Research About Radiation-Induced Cancers with CT Scans, Part 2

Jeff Hall

May 5th 2025

Podcast

In a second part of a new podcast episode on recently published research on projected radiation-induced cancers from computed tomography (CT) scans, Mahadevappa Mahesh, MS, Ph.D., and Joseph Cavallo, M.D., offer current perspectives on cardiac CT dosing, AI advances and the importance of teamwork in ensuring appropriate dosing for CT.

Considering Breast- and Lesion-Level Assessments with Mammography AI: What New Research Reveals

Jeff Hall

June 27th 2025

Article

While there was a decline of AUC for mammography AI software from breast-level assessments to lesion-level evaluation, the authors of a new study, involving 1,200 women, found that AI offered over a seven percent higher AUC for lesion-level interpretation in comparison to unassisted expert readers.

The Reading Room Podcast: Emerging Concepts in Breast Cancer Screening and Health Equity Implications, Part 3

Jeff Hall

September 1st 2023

Podcast

In the third episode of a three-part podcast, Anand Narayan, M.D., Ph.D., and Amy Patel, M.D., discuss the challenges of expanded breast cancer screening amid a backdrop of radiologist shortages and ever-increasing volume on radiology worklists.

New Study Examines Key Factors with False Negatives on AI Mammography Analysis

Jeff Hall

June 25th 2025

Article

Artificial intelligence (AI) software had a 14 percent false negative rate in a new study involving over 1,082 women with invasive breast cancer.

FDA Clears AI-Powered Fetal Ultrasound Analysis Software from DeepEcho

Jeff Hall

June 20th 2025

Article

The AI software reportedly facilitates ease of use and improved accuracy in fetal ultrasound evaluations.