Evaluating the Accuracy and Completeness of Artificial Intelligence Responses Against Established Otology Guidelines-Reference-Cited by-同舟云学术

Evaluating the Accuracy and Completeness of Artificial Intelligence Responses Against Established Otology Guidelines

Published:2024-08-12 Issue:3 Volume:4 Page:e059
ISSN:2766-3604
Container-title:Otology & Neurotology Open
language:en
Short-container-title:

Author:

Rossi Nicholas A.¹,Corona Kassandra K.²,Yoshiyasu Yuki¹,Young Dayton L.¹,McKinnon Brian J.¹

Affiliation:

1. Department of Otolaryngology, University of Texas Medical Branch, Galveston, Texas

2. School of Medicine, University of Texas Medical Branch, Galveston, Texas.

Abstract

Background: The incorporation of artificial intelligence (AI), especially large language models like Generative Pretrained Transformer 4 (GPT-4), into medical practice is a burgeoning field of interest. This research evaluates the applicability of GPT-4 in otology by analyzing its responses to queries based on otologic clinical practice guidelines. Methods: Key guidelines from otology were selected, and corresponding questions were formulated to examine GPT-4’s interpretation and response accuracy. Two independent reviewers assessed the AI-generated answers for accuracy and completeness, using a structured Likert scale. A re-evaluation was conducted to evaluate the reproducibility of the results. Results: The analysis showed a high accuracy level (mean score: 4.75 of 5) and completeness (mean score: 2.88 of 3) in GPT-4’s responses. The interrater agreement, as indicated by Cohen κ, was substantial. GPT-4 consistently advised on individualized treatment plans and professional consultation, particularly for guidelines with weaker evidence, demonstrating its cautious approach to handling medical information. Conclusion: GPT-4 exhibits promising potential as an auxiliary tool in otology, providing accurate and comprehensive information. However, its role should be viewed as supplementary, with emphasis on continual updates and careful monitoring to align with evolving medical knowledge. Future studies are recommended to further explore the depth of AI application in diverse clinical scenarios and its real-time impact on clinical outcomes.

Publisher

Ovid Technologies (Wolters Kluwer Health)

Reference15 articles.

1. ChatGPT, GPT-4, and other large language models: the next revolution for clinical microbiology?;Egli;Clin Infect Dis,2023

2. Benefits, limits, and risks of GPT-4 as an AI chatbot for medicine.;Lee;N Engl J Med,2023

3. Beyond ChatGPT: what does GPT-4 add to healthcare? The dawn of a new era.;Wójcik;Cardiol J,2023

4. GPT-4 accuracy and completeness against international consensus statement on allergy and rhinology: Rhinosinusitis.;Yoshiyasu;Int Forum Allergy Rhinol,2023

5. GPT-4 artificial intelligence model outperforms ChatGPT, medical students, and neurosurgery residents on neurosurgery written board-like questions.;Guerra;World Neurosurg,2023