Exploring AI-chatbots’ capability to suggest surgical planning in ophthalmology: ChatGPT versus Google Gemini analysis of retinal detachment cases-Reference-Cited by-同舟云学术

Exploring AI-chatbots’ capability to suggest surgical planning in ophthalmology: ChatGPT versus Google Gemini analysis of retinal detachment cases

Published:2024-03-06 Issue: Volume: Page:bjo-2023-325143
ISSN:0007-1161
Container-title:British Journal of Ophthalmology
language:en
Short-container-title:Br J Ophthalmol

Author:

Carlà Matteo Mario^ORCID,Gambini Gloria^ORCID,Baldascino Antonio,Giannuzzi Federico,Boselli Francesco,Crincoli Emanuele^ORCID,D’Onofrio Nicola Claudio,Rizzo Stanislao^ORCID

Abstract

BackgroundWe aimed to define the capability of three different publicly available large language models, Chat Generative Pretrained Transformer (ChatGPT-3.5), ChatGPT-4 and Google Gemini in analysing retinal detachment cases and suggesting the best possible surgical planning.MethodsAnalysis of 54 retinal detachments records entered into ChatGPT and Gemini’s interfaces. After asking ‘Specify what kind of surgical planning you would suggest and the eventual intraocular tamponade.’ and collecting the given answers, we assessed the level of agreement with the common opinion of three expert vitreoretinal surgeons. Moreover, ChatGPT and Gemini answers were graded 1–5 (from poor to excellent quality), according to the Global Quality Score (GQS).ResultsAfter excluding 4 controversial cases, 50 cases were included. Overall, ChatGPT-3.5, ChatGPT-4 and Google Gemini surgical choices agreed with those of vitreoretinal surgeons in 40/50 (80%), 42/50 (84%) and 35/50 (70%) of cases. Google Gemini was not able to respond in five cases. Contingency analysis showed significant differences between ChatGPT-4 and Gemini (p=0.03). ChatGPT’s GQS were 3.9±0.8 and 4.2±0.7 for versions 3.5 and 4, while Gemini scored 3.5±1.1. There was no statistical difference between the two ChatGPTs (p=0.22), while both outperformed Gemini scores (p=0.03 and p=0.002, respectively). The main source of error was endotamponade choice (14% for ChatGPT-3.5 and 4, and 12% for Google Gemini). Only ChatGPT-4 was able to suggest a combined phacovitrectomy approach.ConclusionIn conclusion, Google Gemini and ChatGPT evaluated vitreoretinal patients’ records in a coherent manner, showing a good level of agreement with expert surgeons. According to the GQS, ChatGPT’s recommendations were much more accurate and precise.

Publisher

BMJ

Reference27 articles.

1. Ozdemir S . Quick Start Guide to Large Language Models: Strategies and Best Practices for Using ChatGPT and Other. LLMs: Addison-Wesley Professional, 2023.

2. How AI responds to common lung cancer questions: Chatgpt vs Google bard;Rahsepar;Radiology,2023

3. The role of Chatgpt, Generative language models, and artificial intelligence in medical education: a conversation with Chatgpt and a call for papers;Eysenbach;JMIR Med Educ,2023

4. Large language models in medicine;Thirunavukarasu;Nat Med,2023

5. Large language models Encode clinical knowledge;Singhal;Nature,2023

Cited by 12 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Testing the power of Google DeepMind: Gemini versus ChatGPT 4 facing a European ophthalmology examination;AJO International;2024-10

2. Generative Large Language Models in Electronic Health Records for Patient Care Since 2023: A Systematic Review;2024-08-12

3. Fixed-Point Arithmetic Analysis for Development of LLaMA 3 On-Device Accelerator;JOURNAL OF BROADCAST ENGINEERING;2024-07-31

4. Utilizing Linguistic and Acoustic features from Arabic Transcripts for Early Detecting Alzheimer’s Disease Using Different Machine Learning Algorithms;2024 IEEE 7th International Conference on Advanced Technologies, Signal and Image Processing (ATSIP);2024-07-11

5. Artificial Versus Human Intelligence in the Diagnostic Approach of Ophthalmic Case Scenarios: A Qualitative Evaluation of Performance and Consistency;Cureus;2024-06-16