Reliability of large language models in managing odontogenic sinusitis clinical scenarios: a preliminary multidisciplinary evaluation-Reference-Cited by-同舟云学术

Reliability of large language models in managing odontogenic sinusitis clinical scenarios: a preliminary multidisciplinary evaluation

Published:2024-01-08 Issue:4 Volume:281 Page:1835-1841
ISSN:0937-4477
Container-title:European Archives of Oto-Rhino-Laryngology
language:en
Short-container-title:Eur Arch Otorhinolaryngol

Author:

Saibene Alberto Maria^ORCID,Allevi Fabiana^ORCID,Calvo-Henriquez Christian^ORCID,Maniaci Antonino^ORCID,Mayo-Yáñez Miguel^ORCID,Paderno Alberto^ORCID,Vaira Luigi Angelo^ORCID,Felisati Giovanni^ORCID,Craig John R.^ORCID

Abstract

Abstract Purpose This study aimed to evaluate the utility of large language model (LLM) artificial intelligence tools, Chat Generative Pre-Trained Transformer (ChatGPT) versions 3.5 and 4, in managing complex otolaryngological clinical scenarios, specifically for the multidisciplinary management of odontogenic sinusitis (ODS). Methods A prospective, structured multidisciplinary specialist evaluation was conducted using five ad hoc designed ODS-related clinical scenarios. LLM responses to these scenarios were critically reviewed by a multidisciplinary panel of eight specialist evaluators (2 ODS experts, 2 rhinologists, 2 general otolaryngologists, and 2 maxillofacial surgeons). Based on the level of disagreement from panel members, a Total Disagreement Score (TDS) was calculated for each LLM response, and TDS comparisons were made between ChatGPT3.5 and ChatGPT4, as well as between different evaluators. Results While disagreement to some degree was demonstrated in 73/80 evaluator reviews of LLMs’ responses, TDSs were significantly lower for ChatGPT4 compared to ChatGPT3.5. Highest TDSs were found in the case of complicated ODS with orbital abscess, presumably due to increased case complexity with dental, rhinologic, and orbital factors affecting diagnostic and therapeutic options. There were no statistically significant differences in TDSs between evaluators’ specialties, though ODS experts and maxillofacial surgeons tended to assign higher TDSs. Conclusions LLMs like ChatGPT, especially newer versions, showed potential for complimenting evidence-based clinical decision-making, but substantial disagreement was still demonstrated between LLMs and clinical specialists across most case examples, suggesting they are not yet optimal in aiding clinical management decisions. Future studies will be important to analyze LLMs’ performance as they evolve over time.

Funder

Università degli Studi di Milano

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s00405-023-08372-4.pdf

Reference24 articles.

1. Liu S, Wright AP, Patterson BL et al (2023) Using AI-generated suggestions from ChatGPT to optimize clinical decision support. J Am Med Inform Assoc 30:1237–1245. https://doi.org/10.1093/jamia/ocad072

2. Chiesa-Estomba CM, Lechien JR, Vaira LA et al (2023) Exploring the potential of Chat-GPT as a supportive tool for sialendoscopy clinical decision making and patient information support. Eur Arch Otorhinolaryngol. https://doi.org/10.1007/s00405-023-08104-8

3. Saibene AM, Pipolo C, Borloni R et al (2021) ENT and dentist cooperation in the management of odontogenic sinusitis. A review. Acta Otorhinolaryngol Ital 41:S116–S123. https://doi.org/10.14639/0392-100x-suppl.1-41-2021-12

4. Allevi F, Fadda GL, Rosso C et al (2021) Diagnostic criteria for odontogenic sinusitis: a systematic review. Am J Rhinol Allergy 35:713–721. https://doi.org/10.1177/1945892420976766

5. Craig JR, Saibene AM, Felisati G (2021) Chronic odontogenic rhinosinusitis: optimization of surgical treatment indications. Am J Rhinol Allergy 35:142–143. https://doi.org/10.1177/1945892420965474

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Generative AI and Otolaryngology—Head & Neck Surgery;Otolaryngologic Clinics of North America;2024-10

2. Evaluation of Vertigo-Related Information from Artificial Intelligence Chatbot;2024-09-02

3. Enhancing AI Chatbot Responses in Healthcare: The SMART Prompt Structure in Head and Neck Surgery;2024-08-23

4. Reliability of large language models for advanced head and neck malignancies management: a comparison between ChatGPT 4 and Gemini Advanced;European Archives of Oto-Rhino-Laryngology;2024-05-25

5. Applications of ChatGPT in Otolaryngology–Head Neck Surgery: A State of the Art Review;Otolaryngology–Head and Neck Surgery;2024-05-08