Evaluation of a digital ophthalmologist app built by GPT4-V(ision)-Reference-Cited by-同舟云学术

Evaluation of a digital ophthalmologist app built by GPT4-V(ision)

Published:2023-11-27 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Xu Pusheng,Chen Xiaolan,Zhao Ziwei,Zheng Yingfeng^ORCID,Jin Guangming,Shi Danli,He Mingguang

Abstract

AbstractBackgroundsGPT4-V(ision) has generated great interest across various fields, while its performance in ocular multimodal images is still unknown. This study aims to evaluate the capabilities of a GPT-4V-based chatbot in addressing queries related to ocular multimodal images.MethodsA digital ophthalmologist app was built based on GPT-4V. The evaluation dataset comprised various ocular imaging modalities: slit-lamp, scanning laser ophthalmoscopy (SLO), fundus photography of the posterior pole (FPP), optical coherence tomography (OCT), fundus fluorescein angiography (FFA), and ocular ultrasound (OUS). Each modality included images representing 5 common and 5 rare diseases. The chatbot was presented with ten questions per image, focusing on examination identification, lesion detection, diagnosis, decision support, and the repeatability of diagnosis. The responses of GPT-4V were evaluated based on accuracy, usability, and safety.ResultsThere was a substantial agreement among three ophthalmologists. Out of 600 responses, 30.5% were accurate, 22.8% of 540 responses were highly usable, and 55.5% of 540 responses were considered safe by ophthalmologists. The chatbot excelled in interpreting slit-lamp images, with 42.0%, 42.2%, and 68.5% of the responses being accurate, highly usable, and no harm, respectively. However, its performance was notably weaker in FPP images, with only 13.7%, 3.7%, and 38.5% in the same categories. It correctly identified 95.6% of the imaging modalities. For lesion identification, diagnosis, and decision support, the chatbot’s accuracy was 25.6%, 16.1%, and 24.0%, respectively. The average proportions of correct answers, highly usable, and no harm for GPT-4V in common diseases were 37.9%, 30.5%, and 60.1%, respectively. These proportions were all higher compared to those in rare diseases, which were 23.2% (P<0.001), 15.2% (P<0.001), and 51.1% (P=0.032), respectively. The overall repeatability of GPT4-V in diagnosing ocular images was 63% (38/60).ConclusionCurrently, GPT-4V lacks the reliability required for clinical decision-making and patient consultation in ophthalmology. Ongoing refinement and testing are essential for improving the efficacy of large language models in medical applications.

Publisher

Cold Spring Harbor Laboratory

Reference17 articles.

1. Dave T , Athaluri SA , Singh S: ChatGPT in medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations . Frontiers in Artificial Intelligence 2023, 6:1169595.

2. New meaning for NLP: the trials and tribulations of natural language processing with GPT-3 in ophthalmology

3. Raimondi R , Tzoumas N , Salisbury T , Di Simplicio S , Romano MR , (NETRiON) NETRiON, Bommireddy T , Chawla H , Chen Y , Connolly S et al: Comparative analysis of large language models in the Royal College of Ophthalmologists fellowship exams. Eye 2023.

4. Duval R: Evaluating the Performance of ChatGPT in Ophthalmology;Ophthalmology Science,2023

5. Momenaei B , Wakabayashi T , Shahlaee A , Durrani AF , Pandit SA , Wang K , Mansour HA , Abishek RM , Xu D , Sridhar J et al: Appropriateness and Readability of ChatGPT-4 generated Responses for Surgical Treatment of Retinal Diseases . Ophthalmology Retina 2023:S2468653023002464.

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. FFA-GPT: an automated pipeline for fundus fluorescein angiography interpretation and question-answer;npj Digital Medicine;2024-05-03

2. ICGA-GPT: report generation and question answering for indocyanine green angiography images;British Journal of Ophthalmology;2024-03-20

3. Utility of artificial intelligence‐based large language models in ophthalmic care;Ophthalmic and Physiological Optics;2024-02-25