Assessing ChatGPT’s Responses to Otolaryngology Patient Questions-Reference-Cited by-同舟云学术

Assessing ChatGPT’s Responses to Otolaryngology Patient Questions

Published:2024-04-27 Issue:7 Volume:133 Page:658-664
ISSN:0003-4894
Container-title:Annals of Otology, Rhinology & Laryngology
language:en
Short-container-title:Ann Otol Rhinol Laryngol

Author:

Carnino Jonathan M.¹^ORCID,Pellegrini William R.¹²,Willis Megan³,Cohen Michael B.¹²,Paz-Lansberg Marianella¹²,Davis Elizabeth M.¹²,Grillone Gregory A.¹²,Levi Jessica R.¹²

Affiliation:

1. Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA

2. Department of Otolaryngology—Head and Neck Surgery, Boston Medical Center, Boston, MA, USA

3. Department of Biostatistics, Boston University, Boston, MA, USA

Abstract

Objective: This study aims to evaluate ChatGPT’s performance in addressing real-world otolaryngology patient questions, focusing on accuracy, comprehensiveness, and patient safety, to assess its suitability for integration into healthcare. Methods: A cross-sectional study was conducted using patient questions from the public online forum Reddit’s r/AskDocs, where medical advice is sought from healthcare professionals. Patient questions were input into ChatGPT (GPT-3.5), and responses were reviewed by 5 board-certified otolaryngologists. The evaluation criteria included difficulty, accuracy, comprehensiveness, and bedside manner/empathy. Statistical analysis explored the relationship between patient question characteristics and ChatGPT response scores. Potentially dangerous responses were also identified. Results: Patient questions averaged 224.93 words, while ChatGPT responses were longer at 414.93 words. The accuracy scores for ChatGPT responses were 3.76/5, comprehensiveness scores were 3.59/5, and bedside manner/empathy scores were 4.28/5. Longer patient questions did not correlate with higher response ratings. However, longer ChatGPT responses scored higher in bedside manner/empathy. Higher question difficulty correlated with lower comprehensiveness. Five responses were flagged as potentially dangerous. Conclusion: While ChatGPT exhibits promise in addressing otolaryngology patient questions, this study demonstrates its limitations, particularly in accuracy and comprehensiveness. The identification of potentially dangerous responses underscores the need for a cautious approach to AI in medical advice. Responsible integration of AI into healthcare necessitates thorough assessments of model performance and ethical considerations for patient safety.

Publisher

SAGE Publications

Link

https://journals.sagepub.com/doi/pdf/10.1177/00034894241249621

Reference37 articles.

1. Artificial Intelligence Transforms the Future of Health Care

2. Artificial intelligence in healthcare: past, present and future

3. Artificial Intelligence and Machine Learning in Clinical Medicine, 2023

4. Machine Learning in Medicine

5. A Comprehensive Review of Performance of Next-Generation Sequencing Platforms

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Assessing ChatGPT’s Responses to Otolaryngology Patient Questions: Comment;Annals of Otology, Rhinology & Laryngology;2024-05-27