ChatGPT‐4 Consistency in Interpreting Laryngeal Clinical Images of Common Lesions and Disorders

Author:

Maniaci Antonino12,Chiesa‐Estomba Carlos M.134,Lechien Jérôme R.156

Affiliation:

1. Research Committee of Young Otolaryngologists of the International Federation of Otorhinolaryngological Societies (IFOS) Paris France

2. Department of Medicine and Surgery Kore University Enna Italy

3. Division of Laryngology and Broncho‐esophagology, Department of Otolaryngology–Head Neck Surgery, EpiCURA Hospital, UMONS Research Institute for Health Sciences and Technology University of Mons (UMons) Mons Belgium

4. Department of Otorhinolaryngology–Head and Neck Surgery  Donostia University Hospital Donosti‐San Sebastián Spain

5. Department of Otorhinolaryngology and Head and Neck Surgery, Foch Hospital, Phonetics and Phonology Laboratory (UMR 7018 CNRS, Université Sorbonne Nouvelle/Paris 3) Paris Saclay University Paris France

6. Department of Otorhinolaryngology and Head and Neck Surgery CHU Saint‐Pierre Brussels Belgium

Abstract

AbstractObjectiveTo investigate the consistency of Chatbot Generative Pretrained Transformer (ChatGPT)‐4 in the analysis of clinical pictures of common laryngological conditions.Study DesignProspective uncontrolled study.SettingMulticenter study.MethodsPatient history and clinical videolaryngostroboscopic images were presented to ChatGPT‐4 for differential diagnoses, management, and treatment(s). ChatGPT‐4 responses were assessed by 3 blinded laryngologists with the artificial intelligence performance instrument (AIPI). The complexity of cases and the consistency between practitioners and ChatGPT‐4 for interpreting clinical images were evaluated with a 5‐point Likert Scale. The intraclass correlation coefficient (ICC) was used to measure the strength of interrater agreement.ResultsForty patients with a mean complexity score of 2.60 ± 1.15. were included. The mean consistency score for ChatGPT‐4 image interpretation was 2.46 ± 1.42. ChatGPT‐4 perfectly analyzed the clinical images in 6 cases (15%; 5/5), while the consistency between GPT‐4 and judges was high in 5 cases (12.5%; 4/5). Judges reported an ICC of 0.965 for the consistency score (P = .001). ChatGPT‐4 erroneously documented vocal fold irregularity (mass or lesion), glottic insufficiency, and vocal cord paralysis in 21 (52.5%), 2 (0.05%), and 5 (12.5%) cases, respectively. ChatGPT‐4 and practitioners indicated 153 and 63 additional examinations, respectively (P = .001). The ChatGPT‐4 primary diagnosis was correct in 20.0% to 25.0% of cases. The clinical image consistency score was significantly associated with the AIPI score (rs = 0.830; P = .001).ConclusionThe ChatGPT‐4 is more efficient in primary diagnosis, rather than in the image analysis, selecting the most adequate additional examinations and treatments.

Publisher

Wiley

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3