Evaluating the Performance of ChatGPT-4o Vision Capabilities on Image-Based USMLE Step 1, Step 2, and Step 3 Examination Questions-Reference-Cited by-同舟云学术

Evaluating the Performance of ChatGPT-4o Vision Capabilities on Image-Based USMLE Step 1, Step 2, and Step 3 Examination Questions

Published:2024-06-19 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Gajjar Avi A.,Valluri Harshitha,Prabhala Tarun,Custozzo Amanda,Boulos Alan S,Dalfino John C.,Field Nicholas C.,Paul Alexandra R.

Abstract

ABSTRACTIntroductionArtificial intelligence (AI) has significant potential in medicine, especially in diagnostics and education. ChatGPT has achieved levels comparable to medical students on text-based USMLE questions, yet there’s a gap in its evaluation on image-based questions.MethodsThis study evaluated ChatGPT-4’s performance on image-based questions from USMLE Step 1, Step 2, and Step 3. A total of 376 questions, including 54 image-based, were tested using an image-captioning system to generate descriptions for the images.ResultsThe overall performance of ChatGPT-4 on USMLE Steps 1, 2, and 3 was evaluated using 376 questions, including 54 with images. The accuracy was 85.7% for Step 1, 92.5% for Step 2, and 86.9% for Step 3. For image-based questions, the accuracy was 70.8% for Step 1, 92.9% for Step 2, and 62.5% for Step 3. In contrast, text-based questions showed higher accuracy: 89.5% for Step 1, 92.5% for Step 2, and 90.1% for Step 3. Performance dropped significantly for difficult image-based questions in Steps 1 and 3 (p=0.0196 and p=0.0020 respectively), but not in Step 2 (p=0.9574). Despite these challenges, the AI’s accuracy on image-based questions exceeded the passing rate for all three exams.ConclusionsChatGPT-4 can handle image-based USMLE questions above the passing rate, showing promise for its use in medical education and diagnostics. Further development is needed to improve its direct image processing capabilities and overall performance.

Publisher

Cold Spring Harbor Laboratory

Reference17 articles.

1. Unleashing the potential of AI: a deeper dive into GPT prompts for medical research

2. Usefulness and Accuracy of Artificial Intelligence Chatbot Responses to Patient Questions for Neurosurgical Procedures

3. Performance of ChatGPT on Ophthalmology-Related Questions Across Various Examination Levels: Observational Study

4. Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study

5. Artificial Intelligence in Medical Education: Comparative Analysis of ChatGPT, Bing, and Medical Students in Germany

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Performance of Advanced Large Language Models (GPT-4o, GPT-4, Gemini 1.5 Pro, Claude 3 Opus) on Japanese Medical Licensing Examination: A Comparative Study;2024-07-09