Quality of Large Language Model Responses to Radiation Oncology Patient Care Questions-Reference-Cited by-同舟云学术

Quality of Large Language Model Responses to Radiation Oncology Patient Care Questions

Published:2024-04-02 Issue:4 Volume:7 Page:e244630
ISSN:2574-3805
Container-title:JAMA Network Open
language:en
Short-container-title:JAMA Netw Open

Author:

Yalamanchili Amulya¹,Sengupta Bishwambhar¹,Song Joshua¹,Lim Sara¹,Thomas Tarita O.¹,Mittal Bharat B.¹,Abazeed Mohamed E.¹,Teo P. Troy¹

Affiliation:

1. Robert H. Lurie Comprehensive Cancer Center, Department of Radiation Oncology, Northwestern Memorial Hospital, Northwestern University Feinberg School of Medicine, Chicago, Illinois

Abstract

ImportanceArtificial intelligence (AI) large language models (LLMs) demonstrate potential in simulating human-like dialogue. Their efficacy in accurate patient-clinician communication within radiation oncology has yet to be explored.ObjectiveTo determine an LLM’s quality of responses to radiation oncology patient care questions using both domain-specific expertise and domain-agnostic metrics.Design, Setting, and ParticipantsThis cross-sectional study retrieved questions and answers from websites (accessed February 1 to March 20, 2023) affiliated with the National Cancer Institute and the Radiological Society of North America. These questions were used as queries for an AI LLM, ChatGPT version 3.5 (accessed February 20 to April 20, 2023), to prompt LLM-generated responses. Three radiation oncologists and 3 radiation physicists ranked the LLM-generated responses for relative factual correctness, relative completeness, and relative conciseness compared with online expert answers. Statistical analysis was performed from July to October 2023.Main Outcomes and MeasuresThe LLM’s responses were ranked by experts using domain-specific metrics such as relative correctness, conciseness, completeness, and potential harm compared with online expert answers on a 5-point Likert scale. Domain-agnostic metrics encompassing cosine similarity scores, readability scores, word count, lexicon, and syllable counts were computed as independent quality checks for LLM-generated responses.ResultsOf the 115 radiation oncology questions retrieved from 4 professional society websites, the LLM performed the same or better in 108 responses (94%) for relative correctness, 89 responses (77%) for completeness, and 105 responses (91%) for conciseness compared with expert answers. Only 2 LLM responses were ranked as having potential harm. The mean (SD) readability consensus score for expert answers was 10.63 (3.17) vs 13.64 (2.22) for LLM answers (P &lt; .001), indicating 10th grade and college reading levels, respectively. The mean (SD) number of syllables was 327.35 (277.15) for expert vs 376.21 (107.89) for LLM answers (P = .07), the mean (SD) word count was 226.33 (191.92) for expert vs 246.26 (69.36) for LLM answers (P = .27), and the mean (SD) lexicon score was 200.15 (171.28) for expert vs 219.10 (61.59) for LLM answers (P = .24).Conclusions and RelevanceIn this cross-sectional study, the LLM generated accurate, comprehensive, and concise responses with minimal risk of harm, using language similar to human experts but at a higher reading level. These findings suggest the LLM’s potential, with some retraining, as a valuable resource for patient queries in radiation oncology and other medical fields.

Publisher

American Medical Association (AMA)

Link

https://jamanetwork.com/journals/jamanetworkopen/articlepdf/2816884/yalamanchili_2024_oi_240202_1710966452.11228.pdf

Reference36 articles.

1. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models.;Kung;PLOS Digit Health,2023

2. ChatGPT’s quiz skills in different otolaryngology subspecialties: an analysis of 2576 single-choice and multiple-choice board certification preparation questions.;Hoch;Eur Arch Otorhinolaryngol,2023

3. Assessment of artificial intelligence chatbot responses to top searched queries about cancer.;Pan;JAMA Oncol,2023

4. Opportunities and risks of ChatGPT in medicine, science, and academic publishing: a modern Promethean dilemma.;Homolak;Croat Med J,2023

5. Use of artificial intelligence chatbots for cancer treatment information.;Chen;JAMA Oncol,2023

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Testing and Validation of a Custom Retrained Large Language Model for the Supportive Care of HN Patients with External Knowledge Base;Cancers;2024-06-24

2. Performance of Large Language Models on Medical Oncology Examination Questions;JAMA Network Open;2024-06-18

3. Evaluation of AI ChatBots for the Creation of Patient-Informed Consent Sheets;Machine Learning and Knowledge Extraction;2024-05-24