Spot the Difference: Can ChatGPT4-Vision Transform Radiology Artificial Intelligence?-Reference-Cited by-同舟云学术

Spot the Difference: Can ChatGPT4-Vision Transform Radiology Artificial Intelligence?

Published:2023-11-18 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Kelly Brendan S^ORCID,Duignan Sophie,Mathur Prateek,Dillon Henry,Lee Edward H,Yeom Kristen W,Keane Pearse,Lawlor Aonghus,Killeen Ronan P

Abstract

AbstractOpenAI’s flagship Large Language Model ChatGPT can now accept image input (GPT4V). “Spot the Difference” and “Medical” have been suggested as emerging applications. The interpretation of medical images is a dynamic process not a static task. Diagnosis and treatment of Multiple Sclerosis is dependent on identification of radiologic change. We aimed to compare the zero-shot performance of GPT4V to a trained U-Net and Vision Transformer (ViT) for the identification of progression of MS on MRI.170 patients were included. 100 unseen paired images were randomly used for testing. Both U-Net and ViT had 94% accuracy while GPT4V had 85%. GPT4V gave overly cautious non-answers in 6 cases. GPT4V had a precision, recall and F1 score of 0.896, 0.915, 0.905 compared to 1.0, 0.88 and 0.936 for U-Net and 0.94, 0.94, 0.94 for ViT.The impressive performance compared to trained models and a no-code drag and drop interface suggest GPT4V has the potential to disrupt AI radiology research. However misclassified cases, hallucinations and overly cautious non-answers confirm that it is not ready for clinical use. GPT4V’s widespread availability and relatively high error rate highlight the need for caution and education for lay-users, especially those with limited access to expert healthcare.Key points

Even without fine tuning and without the need for prior coding experience or additional hardware, GPT4V can perform a zero-shot radiologic change detection task with reasonable accuracy.

We find GPT4V does not match the performance of established state of the art computer vision models. GPT4V’s performance metrics are more similar to the vision transformers than the convolutional neural networks, giving some possible insight into its underlying architecture.

This is an exploratory experimental study and GPT4V is not intended for use as a medical device.

Summary statementGPT4V can identify radiologic progression of Multiple Sclerosis in a simplified experimental setting. However GPT4V is not a medical device and its widespread availability and relatively high error rate highlight the need for caution and education for lay-users, especially those with limited access to expert healthcare.

Publisher

Cold Spring Harbor Laboratory

Reference21 articles.

1. Multiple Sclerosis

2. Current and Emerging Therapies in Multiple Sclerosis: Implications for the Radiologist, Part 1—Mechanisms, Efficacy, and Safety

3. Current and Future Biomarkers in Multiple Sclerosis

4. Discrepancy Rates and Clinical Impact of Imaging Secondary Interpretations: A Systematic Review and Meta-Analysis

5. A survey of deep learning methods for multiple sclerosis identification using brain MRI images;Neural Comput Appl,2022

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Advancing medical imaging with language models: featuring a spotlight on ChatGPT;Physics in Medicine & Biology;2024-05-03