ChatGPT vs UpToDate: comparative study of usefulness and reliability of Chatbot in common clinical presentations of otorhinolaryngology–head and neck surgery-Reference-Cited by-同舟云学术

ChatGPT vs UpToDate: comparative study of usefulness and reliability of Chatbot in common clinical presentations of otorhinolaryngology–head and neck surgery

Published:2024-01-13 Issue:4 Volume:281 Page:2145-2151
ISSN:0937-4477
Container-title:European Archives of Oto-Rhino-Laryngology
language:en
Short-container-title:Eur Arch Otorhinolaryngol

Author:

Karimov Ziya^ORCID,Allahverdiyev Irshad^ORCID,Agayarov Ozlem Yagiz^ORCID,Demir Dogukan^ORCID,Almuradova Elvina^ORCID

Abstract

Abstract Purpose The usage of Chatbots as a kind of Artificial Intelligence in medicine is getting to increase in recent years. UpToDate® is another well-known search tool established on evidence-based knowledge and is used daily by doctors worldwide. In this study, we aimed to investigate the usefulness and reliability of ChatGPT compared to UpToDate in Otorhinolaryngology and Head and Neck Surgery (ORL–HNS). Materials and methods ChatGPT-3.5 and UpToDate were interrogated for the management of 25 common clinical case scenarios (13 males/12 females) recruited from literature considering the daily observation at the Department of Otorhinolaryngology of Ege University Faculty of Medicine. Scientific references for the management were requested for each clinical case. The accuracy of the references in the ChatGPT answers was assessed on a 0–2 scale and the usefulness of the ChatGPT and UpToDate answers was assessed with 1–3 scores by reviewers. UpToDate and ChatGPT 3.5 responses were compared. Results ChatGPT did not give references in some questions in contrast to UpToDate. Information on the ChatGPT was limited to 2021. UpToDate supported the paper with subheadings, tables, figures, and algorithms. The mean accuracy score of references in ChatGPT answers was 0.25–weak/unrelated. The median (Q1–Q3) was 1.00 (1.25–2.00) for ChatGPT and 2.63 (2.75–3.00) for UpToDate, the difference was statistically significant (p < 0.001). UpToDate was observed more useful and reliable than ChatGPT. Conclusions ChatGPT has the potential to support the physicians to find out the information but our results suggest that ChatGPT needs to be improved to increase the usefulness and reliability of medical evidence-based knowledge.

Funder

Ege University

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s00405-023-08423-w.pdf

Reference35 articles.

1. Knoedler L, Baecher H, Kauke-Navarro M, Prantl L, Machens HG, Scheuermann P, Palm C, Baumann R, Kehrer A, Panayi AC, Knoedler S (2022) Towards a reliable and rapid automated grading system in facial palsy patients: facial palsy surgery meets computer science. J Clin Med 11(17):4998. https://doi.org/10.3390/jcm11174998

2. Crowson MG, Dixon P, Mahmood R, Lee JW, Shipp D, Le T, Lin V, Chen J, Chan TCY (2020) Predicting postoperative cochlear implant performance using supervised machine learning. Otol Neurotol 41(8):e1013. https://doi.org/10.1097/MAO.0000000000002710

3. Wang B, Zheng J, Yu JF, Lin SY, Yan SY, Zhang LY, Wang SS, Cai SJ, Abdelhamid Ahmed AH, Lin LQ, Chen F, Randolph GW, Zhao WX (2022) Development of artificial intelligence for parathyroid recognition during endoscopic thyroid surgery. Laryngoscope 132(12):2516–2523. https://doi.org/10.1002/lary.30173

4. Qu RW, Qureshi U, Petersen G, Lee SC (2023) Diagnostic and management applications of chatgpt in structured otolaryngology clinical scenarios. OTO Open 7(3):e67. https://doi.org/10.1002/oto2.67

5. Lim SJ, Jeon E, Baek N, Chung YH, Kim SY, Song I, Rah YC, Oh KH, Choi J (2023) Prediction of hearing prognosis after intact canal wall mastoidectomy with tympanoplasty using artificial intelligence. Otolaryngol Neck Surg. https://doi.org/10.1002/ohn.472

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Generative AI and Otolaryngology—Head & Neck Surgery;Otolaryngologic Clinics of North America;2024-10

2. Decoding the NCCN Guidelines With AI: A Comparative Evaluation of ChatGPT-4.0 and Llama 2 in the Management of Thyroid Carcinoma;The American Surgeon™;2024-08-13

3. “Pseudo” Intelligence or Misguided or Mis-sourced Intelligence?;The Annals of Thoracic Surgery;2024-07

4. Is ChatGPT smarter than Otolaryngology trainees? A comparison study of board style exam questions;2024-06-18

5. Accelerating editorial processes in scientific journals: Leveraging AI for rapid manuscript review;Oral Oncology Reports;2024-06