Vision-Language Models for Feature Detection of Macular Diseases on Optical Coherence Tomography-Reference-Cited by-同舟云学术

Vision-Language Models for Feature Detection of Macular Diseases on Optical Coherence Tomography

Published:2024-06-01 Issue:6 Volume:142 Page:573
ISSN:2168-6165
Container-title:JAMA Ophthalmology
language:en
Short-container-title:JAMA Ophthalmol

Author:

Antaki Fares¹²³,Chopra Reena¹²⁴,Keane Pearse A.¹²⁴

Affiliation:

1. Institute of Ophthalmology, University College London, London, United Kingdom

2. Moorfields Eye Hospital National Health Service Foundation Trust, London, United Kingdom

3. The Centre Hospitalier de l’Université de Montréal School of Artificial Intelligence in Healthcare, Montreal, Quebec, Canada

4. National Institute for Health and Care Research Biomedical Research Centre at Moorfields Eye Hospital National Health Service Foundation Trust, London, United Kingdom

Abstract

ImportanceVision-language models (VLMs) are a novel artificial intelligence technology capable of processing image and text inputs. While demonstrating strong generalist capabilities, their performance in ophthalmology has not been extensively studied.ObjectiveTo assess the performance of the Gemini Pro VLM in expert-level tasks for macular diseases from optical coherence tomography (OCT) scans.Design, Setting, and ParticipantsThis was a cross-sectional diagnostic accuracy study evaluating a generalist VLM on ophthalmology-specific tasks using the open-source Optical Coherence Tomography Image Database. The dataset included OCT B-scans from 50 unique patients: healthy individuals and those with macular hole, diabetic macular edema, central serous chorioretinopathy, and age-related macular degeneration. Each OCT scan was labeled for 10 key pathological features, referral recommendations, and treatments. The images were captured using a Cirrus high definition OCT machine (Carl Zeiss Meditec) at Sankara Nethralaya Eye Hospital, Chennai, India, and the dataset was published in December 2018. Image acquisition dates were not specified.ExposuresGemini Pro, using a standard prompt to extract structured responses on December 15, 2023.Main Outcomes and MeasuresThe primary outcome was model responses compared against expert labels, calculating F1 scores for each pathological feature. Secondary outcomes included accuracy in diagnosis, referral urgency, and treatment recommendation. The model’s internal concordance was evaluated by measuring the alignment between referral and treatment recommendations, independent of diagnostic accuracy.ResultsThe mean F1 score was 10.7% (95% CI, 2.4-19.2). Measurable F1 scores were obtained for macular hole (36.4%; 95% CI, 0-71.4), pigment epithelial detachment (26.1%; 95% CI, 0-46.2), subretinal hyperreflective material (24.0%; 95% CI, 0-45.2), and subretinal fluid (20.0%; 95% CI, 0-45.5). A correct diagnosis was achieved in 17 of 50 cases (34%; 95% CI, 22-48). Referral recommendations varied: 28 of 50 were correct (56%; 95% CI, 42-70), 10 of 50 were overcautious (20%; 95% CI, 10-32), and 12 of 50 were undercautious (24%; 95% CI, 12-36). Referral and treatment concordance were very high, with 48 of 50 (96%; 95 % CI, 90-100) and 48 of 49 (98%; 95% CI, 94-100) correct answers, respectively.Conclusions and RelevanceIn this study, a generalist VLM demonstrated limited vision capabilities for feature detection and management of macular disease. However, it showed low self-contradiction, suggesting strong language capabilities. As VLMs continue to improve, validating their performance on large benchmarking datasets will help ascertain their potential in ophthalmology.

Publisher

American Medical Association (AMA)

Link

https://jamanetwork.com/journals/jamaophthalmology/articlepdf/2818270/jamaophthalmology_antaki_2024_br_240003_1718141488.97693.pdf

Reference11 articles.

1. Evaluating the performance of ChatGPT in ophthalmology: an analysis of its successes and shortcomings.;Antaki;Ophthalmol Sci,2023

2. Capabilities of GPT-4 in ophthalmology: an analysis of model entropy and progress towards human-level medical question answering.;Antaki;Br J Ophthalmol,2023

3. OCTID: optical coherence tomography image database.;Gholami;Comput Electr Eng,2020

4. OCTID: optical coherence tomography image database.;Gholami;arXiv,2018

5. Clinically applicable deep learning for diagnosis and referral in retinal disease.;De Fauw;Nat Med,2018

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Novel artificial intelligence for diabetic retinopathy and diabetic macular edema: what is new in 2024?;Current Opinion in Ophthalmology;2024-08-27

2. The Diagnostic Accuracy of Vision-Language Model on Clinical Images Among Skin of Colour Phototypes;Journal of Cutaneous Medicine and Surgery;2024-08-05

3. Capabilities of GPT-4o and Gemini 1.5 Pro in Gram stain and bacterial shape identification;Future Microbiology;2024-07-29

4. Generative artificial intelligence in ophthalmology: current innovations, future applications and challenges;British Journal of Ophthalmology;2024-06-26

5. Multimodal Machine Learning Enables AI Chatbot to Diagnose Ophthalmic Diseases and Provide High-Quality Medical Responses: A Model Development and Multicenter Study;2024