ChatGPT‐4 Consistency in Interpreting Laryngeal Clinical Images of Common Lesions and Disorders-Reference-Cited by-同舟云学术

ChatGPT‐4 Consistency in Interpreting Laryngeal Clinical Images of Common Lesions and Disorders

Published:2024-07-24 Issue: Volume: Page:
ISSN:0194-5998
Container-title:Otolaryngology–Head and Neck Surgery
language:en
Short-container-title:Otolaryngol.--head neck surg.

Author:

Maniaci Antonino¹²,Chiesa‐Estomba Carlos M.¹³⁴,Lechien Jérôme R.¹⁵⁶

Affiliation:

1. Research Committee of Young Otolaryngologists of the International Federation of Otorhinolaryngological Societies (IFOS) Paris France

2. Department of Medicine and Surgery Kore University Enna Italy

3. Division of Laryngology and Broncho‐esophagology, Department of Otolaryngology–Head Neck Surgery, EpiCURA Hospital, UMONS Research Institute for Health Sciences and Technology University of Mons (UMons) Mons Belgium

4. Department of Otorhinolaryngology–Head and Neck Surgery Donostia University Hospital Donosti‐San Sebastián Spain

5. Department of Otorhinolaryngology and Head and Neck Surgery, Foch Hospital, Phonetics and Phonology Laboratory (UMR 7018 CNRS, Université Sorbonne Nouvelle/Paris 3) Paris Saclay University Paris France

6. Department of Otorhinolaryngology and Head and Neck Surgery CHU Saint‐Pierre Brussels Belgium

Abstract

AbstractObjectiveTo investigate the consistency of Chatbot Generative Pretrained Transformer (ChatGPT)‐4 in the analysis of clinical pictures of common laryngological conditions.Study DesignProspective uncontrolled study.SettingMulticenter study.MethodsPatient history and clinical videolaryngostroboscopic images were presented to ChatGPT‐4 for differential diagnoses, management, and treatment(s). ChatGPT‐4 responses were assessed by 3 blinded laryngologists with the artificial intelligence performance instrument (AIPI). The complexity of cases and the consistency between practitioners and ChatGPT‐4 for interpreting clinical images were evaluated with a 5‐point Likert Scale. The intraclass correlation coefficient (ICC) was used to measure the strength of interrater agreement.ResultsForty patients with a mean complexity score of 2.60 ± 1.15. were included. The mean consistency score for ChatGPT‐4 image interpretation was 2.46 ± 1.42. ChatGPT‐4 perfectly analyzed the clinical images in 6 cases (15%; 5/5), while the consistency between GPT‐4 and judges was high in 5 cases (12.5%; 4/5). Judges reported an ICC of 0.965 for the consistency score (P = .001). ChatGPT‐4 erroneously documented vocal fold irregularity (mass or lesion), glottic insufficiency, and vocal cord paralysis in 21 (52.5%), 2 (0.05%), and 5 (12.5%) cases, respectively. ChatGPT‐4 and practitioners indicated 153 and 63 additional examinations, respectively (P = .001). The ChatGPT‐4 primary diagnosis was correct in 20.0% to 25.0% of cases. The clinical image consistency score was significantly associated with the AIPI score (rs = 0.830; P = .001).ConclusionThe ChatGPT‐4 is more efficient in primary diagnosis, rather than in the image analysis, selecting the most adequate additional examinations and treatments.

Publisher

Wiley

Reference21 articles.

1. Applications of ChatGPT in otolaryngology‐head neck surgery: a systematic review;Lechien JR;Otolaryngol Head Neck Surg,2024

2. Using ChatGPT to generate research ideas in dysphagia: a pilot study;Nachalon Y;Dysphagia,2023

3. Assessing the accuracy of ChatGPT references in head and neck and ENT disciplines

4. Accuracy of ChatGPT‐generated information on head and neck and oromaxillofacial surgery: a multicenter collaborative analysis;Vaira LA;Otolaryngol Head Neck Surg,2023

5. Performance and Consistency of ChatGPT‐4 Versus Otolaryngologists: A Clinical Case Series