ChatGPT’s diagnostic performance based on textual vs. visual information compared to radiologists’ diagnostic performance in musculoskeletal radiology-Reference-Cited by-同舟云学术

ChatGPT’s diagnostic performance based on textual vs. visual information compared to radiologists’ diagnostic performance in musculoskeletal radiology

Published:2024-07-12 Issue: Volume: Page:
ISSN:1432-1084
Container-title:European Radiology
language:en
Short-container-title:Eur Radiol

Author:

Horiuchi Daisuke,Tatekawa Hiroyuki,Oura Tatsushi,Shimono Taro,Walston Shannon L.,Takita Hirotaka,Matsushita Shu,Mitsuyama Yasuhito,Miki Yukio,Ueda Daiju^ORCID

Abstract

Abstract Objectives To compare the diagnostic accuracy of Generative Pre-trained Transformer (GPT)-4-based ChatGPT, GPT-4 with vision (GPT-4V) based ChatGPT, and radiologists in musculoskeletal radiology. Materials and methods We included 106 “Test Yourself” cases from Skeletal Radiology between January 2014 and September 2023. We input the medical history and imaging findings into GPT-4-based ChatGPT and the medical history and images into GPT-4V-based ChatGPT, then both generated a diagnosis for each case. Two radiologists (a radiology resident and a board-certified radiologist) independently provided diagnoses for all cases. The diagnostic accuracy rates were determined based on the published ground truth. Chi-square tests were performed to compare the diagnostic accuracy of GPT-4-based ChatGPT, GPT-4V-based ChatGPT, and radiologists. Results GPT-4-based ChatGPT significantly outperformed GPT-4V-based ChatGPT (p < 0.001) with accuracy rates of 43% (46/106) and 8% (9/106), respectively. The radiology resident and the board-certified radiologist achieved accuracy rates of 41% (43/106) and 53% (56/106). The diagnostic accuracy of GPT-4-based ChatGPT was comparable to that of the radiology resident, but was lower than that of the board-certified radiologist although the differences were not significant (p = 0.78 and 0.22, respectively). The diagnostic accuracy of GPT-4V-based ChatGPT was significantly lower than those of both radiologists (p < 0.001 and < 0.001, respectively). Conclusion GPT-4-based ChatGPT demonstrated significantly higher diagnostic accuracy than GPT-4V-based ChatGPT. While GPT-4-based ChatGPT’s diagnostic performance was comparable to radiology residents, it did not reach the performance level of board-certified radiologists in musculoskeletal radiology. Clinical relevance statement GPT-4-based ChatGPT outperformed GPT-4V-based ChatGPT and was comparable to radiology residents, but it did not reach the level of board-certified radiologists in musculoskeletal radiology. Radiologists should comprehend ChatGPT’s current performance as a diagnostic tool for optimal utilization. Key Points

This study compared the diagnostic performance of GPT-4-based ChatGPT, GPT-4V-based ChatGPT, and radiologists in musculoskeletal radiology.

GPT-4-based ChatGPT was comparable to radiology residents, but did not reach the level of board-certified radiologists.

When utilizing ChatGPT, it is crucial to input appropriate descriptions of imaging findings rather than the images.

Graphical Abstract

Funder

Guerbet

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s00330-024-10902-5.pdf

Reference36 articles.

1. OpenAI (2023) GPT-4 technical report. arXiv [csCL]. https://doi.org/10.48550/arXiv.2303.08774

2. Brown TB, Mann B, Ryder N et al (2020) Language models are few-shot learners. arXiv [csCL]. https://doi.org/10.48550/arXiv.2005.14165

3. Bubeck S, Chandrasekaran V, Eldan R et al (2023) Sparks of artificial general intelligence: early experiments with GPT-4. arXiv [csCL]. https://doi.org/10.48550/arXiv.2303.12712

4. Eloundou T, Manning S, Mishkin P, Rock D (2023) GPTs are GPTs: an early look at the labor market impact potential of large language models. arXiv [econGN]. https://doi.org/10.48550/arXiv.2303.10130

5. OpenAI, GPT-4V(ision) system card (2023) Available via https://openai.com/research/gpt-4v-system-card. Accessed Oct 13 2023

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Evaluating ChatGPT-4o in Diffusion-weighted Imaging Interpretation: Is it Useful?;Academic Radiology;2024-09