Accuracy of an Artificial Intelligence Chatbot’s Interpretation of Clinical Ophthalmic Images

Author:

Mihalache Andrew1,Huang Ryan S.1,Popovic Marko M.2,Patil Nikhil S.3,Pandya Bhadra U.1,Shor Reut2,Pereira Austin2,Kwok Jason M.12,Yan Peng12,Wong David T.24,Kertes Peter J.25,Muni Rajeev H.24

Affiliation:

1. Temerty School of Medicine, University of Toronto, Toronto, Ontario, Canada

2. Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, Ontario, Canada

3. Michael G. DeGroote School of Medicine, McMaster University, Hamilton, Ontario, Canada

4. Department of Ophthalmology, St Michael’s Hospital/Unity Health Toronto, Toronto, Ontario, Canada

5. John and Liz Tory Eye Centre, Sunnybrook Health Science Centre, Toronto, Ontario, Canada

Abstract

ImportanceOphthalmology is reliant on effective interpretation of multimodal imaging to ensure diagnostic accuracy. The new ability of ChatGPT-4 (OpenAI) to interpret ophthalmic images has not yet been explored.ObjectiveTo evaluate the performance of the novel release of an artificial intelligence chatbot that is capable of processing imaging data.Design, Setting, and ParticipantsThis cross-sectional study used a publicly available dataset of ophthalmic cases from OCTCases, a medical education platform based out of the Department of Ophthalmology and Vision Sciences at the University of Toronto, with accompanying clinical multimodal imaging and multiple-choice questions. Across 137 available cases, 136 contained multiple-choice questions (99%).ExposuresThe chatbot answered questions requiring multimodal input from October 16 to October 23, 2023.Main Outcomes and MeasuresThe primary outcome was the accuracy of the chatbot in answering multiple-choice questions pertaining to image recognition in ophthalmic cases, measured as the proportion of correct responses. χ2 Tests were conducted to compare the proportion of correct responses across different ophthalmic subspecialties.ResultsA total of 429 multiple-choice questions from 136 ophthalmic cases and 448 images were included in the analysis. The chatbot answered 299 of multiple-choice questions correctly across all cases (70%). The chatbot’s performance was better on retina questions than neuro-ophthalmology questions (77% vs 58%; difference = 18%; 95% CI, 7.5%-29.4%; χ21 = 11.4; P < .001). The chatbot achieved a better performance on nonimage–based questions compared with image-based questions (82% vs 65%; difference = 17%; 95% CI, 7.8%-25.1%; χ21 = 12.2; P < .001).The chatbot performed best on questions in the retina category (77% correct) and poorest in the neuro-ophthalmology category (58% correct). The chatbot demonstrated intermediate performance on questions from the ocular oncology (72% correct), pediatric ophthalmology (68% correct), uveitis (67% correct), and glaucoma (61% correct) categories.Conclusions and RelevanceIn this study, the recent version of the chatbot accurately responded to approximately two-thirds of multiple-choice questions pertaining to ophthalmic cases based on imaging interpretation. The multimodal chatbot performed better on questions that did not rely on the interpretation of imaging modalities. As the use of multimodal chatbots becomes increasingly widespread, it is imperative to stress their appropriate integration within medical contexts.

Publisher

American Medical Association (AMA)

Cited by 23 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3