Evaluation of the accuracy and readability of ChatGPT-4 and Google Gemini in providing information on retinal detachment: a multicenter expert comparative study-Reference-Cited by-同舟云学术

Evaluation of the accuracy and readability of ChatGPT-4 and Google Gemini in providing information on retinal detachment: a multicenter expert comparative study

Published:2024-09-02 Issue:1 Volume:10 Page:
ISSN:2056-9920
Container-title:International Journal of Retina and Vitreous
language:en
Short-container-title:Int J Retin Vitr

Author:

Strzalkowski Piotr^ORCID,Strzalkowska Alicja,Chhablani Jay,Pfau Kristina,Errera Marie-Hélène,Roth Mathias,Schaub Friederike,Bechrakis Nikolaos E.,Hoerauf Hans,Reiter Constantin,Schuster Alexander K.,Geerling Gerd,Guthoff Rainer

Abstract

Abstract Background Large language models (LLMs) such as ChatGPT-4 and Google Gemini show potential for patient health education, but concerns about their accuracy require careful evaluation. This study evaluates the readability and accuracy of ChatGPT-4 and Google Gemini in answering questions about retinal detachment. Methods Comparative study analyzing responses from ChatGPT-4 and Google Gemini to 13 retinal detachment questions, categorized by difficulty levels (D1, D2, D3). Masked responses were reviewed by ten vitreoretinal specialists and rated on correctness, errors, thematic accuracy, coherence, and overall quality grading. Analysis included Flesch Readability Ease Score, word and sentence counts. Results Both Artificial Intelligence tools required college-level understanding for all difficulty levels. Google Gemini was easier to understand (p = 0.03), while ChatGPT-4 provided more correct answers for the more difficult questions (p = 0.0005) with fewer serious errors. ChatGPT-4 scored highest on most challenging questions, showing superior thematic accuracy (p = 0.003). ChatGPT-4 outperformed Google Gemini in 8 of 13 questions, with higher overall quality grades in the easiest (p = 0.03) and hardest levels (p = 0.0002), showing a lower grade as question difficulty increased. Conclusions ChatGPT-4 and Google Gemini effectively address queries about retinal detachment, offering mostly accurate answers with few critical errors, though patients require higher education for comprehension. The implementation of AI tools may contribute to improving medical care by providing accurate and relevant healthcare information quickly.

Funder

Universitätsklinikum Düsseldorf. Anstalt öffentlichen Rechts

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1186/s40942-024-00579-9.pdf

Reference30 articles.

1. Hartzband P, Groopman J. Untangling the Web–patients, doctors, and the internet. N Engl J Med. 2010;362:1063–6.

2. Rich AS, Gureckis T. Lessons for artificial intelligence from the study of natural stupidity. Nat Mach Intell. 2019;1:174–80.

3. Powles J, Hodson H. Google DeepMind and healthcare in an age of algorithms. Health Technol. 2017;7:351–67.

4. Bini SA, Artificial Intelligence M, Learning. Deep learning, and Cognitive Computing: what do these terms Mean and how will they Impact Health Care? J Arthroplasty. 2018;33:2358–61.

5. Millenson ML, Baldwin JL, Zipperer L, Singh H. Beyond Dr. Google: the evidence on consumer-facing digital tools for diagnosis. Diagnosis (Berl). 2018;5:95–105.