Performance of ChatGPT, human radiologists, and context-aware ChatGPT in identifying AO codes from radiology reports-Reference-Cited by-同舟云学术

Performance of ChatGPT, human radiologists, and context-aware ChatGPT in identifying AO codes from radiology reports

Published:2023-08-30 Issue:1 Volume:13 Page:
ISSN:2045-2322
Container-title:Scientific Reports
language:en
Short-container-title:Sci Rep

Author:

Russe Maximilian F.,Fink Anna,Ngo Helen,Tran Hien,Bamberg Fabian,Reisert Marco,Rau Alexander

Abstract

AbstractWhile radiologists can describe a fracture’s morphology and complexity with ease, the translation into classification systems such as the Arbeitsgemeinschaft Osteosynthesefragen (AO) Fracture and Dislocation Classification Compendium is more challenging. We tested the performance of generic chatbots and chatbots aware of specific knowledge of the AO classification provided by a vector-index and compared it to human readers. In the 100 radiological reports we created based on random AO codes, chatbots provided AO codes significantly faster than humans (mean 3.2 s per case vs. 50 s per case, p < .001) though not reaching human performance (max. chatbot performance of 86% correct full AO codes vs. 95% in human readers). In general, chatbots based on GPT 4 outperformed the ones based on GPT 3.5-Turbo. Further, we found that providing specific knowledge substantially enhances the chatbot’s performance and consistency as the context-aware chatbot based on GPT 4 provided 71% consistent correct full AO codes for the compared to the 2% consistent correct full AO codes for the generic ChatGPT 4. This provides evidence, that refining and providing specific context to ChatGPT will be the next essential step in harnessing its power.

Funder

Universitätsklinikum Freiburg

Publisher

Springer Science and Business Media LLC

Subject

Multidisciplinary

Link

https://www.nature.com/articles/s41598-023-41512-8.pdf

Reference17 articles.

1. Hallas, P. & Ellingsen, T. Errors in fracture diagnoses in the emergency department—Characteristics of patients and diurnal variation. BMC Emerg. Med. 6, 4 (2006).

2. Shehovych, A., Salar, O., Meyer, C. & Ford, D. Adult distal radius fractures classification systems: essential clinical knowledge or abstract memory testing?. Ann. R. Coll. Surg. Engl. 98, 525–531 (2016).

3. Kung, T. H. et al. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit. Health 2, e0000198 (2023).

4. Nori, H., King, N., McKinney, S. M., Carignan, D. & Horvitz, E. Capabilities of GPT-4 on medical challenge problems (2023). https://doi.org/10.48550/ARXIV.2303.13375

5. Buvat, I. & Weber, W. Nuclear medicine from a novel perspective: Buvat and Weber Talk with OpenAI’s ChatGPT. J. Nucl. Med. Off. Publ. Soc. Nucl. Med. 64, 505–507 (2023).

Cited by 17 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A future role for health applications of large language models depends on regulators enforcing safety standards;The Lancet Digital Health;2024-09

2. Evaluating Artificial Intelligence Competency in Education: Performance of ChatGPT-4 in the American Registry of Radiologic Technologists (ARRT) Radiography Certification Exam;Academic Radiology;2024-08

3. Patient-centered radiology reports with generative artificial intelligence: adding value to radiology reporting;Scientific Reports;2024-06-08

4. Performance evaluation of ChatGPT in detecting diagnostic errors and their contributing factors: an analysis of 545 case reports of diagnostic errors;BMJ Open Quality;2024-06

5. Opportunities and challenges in the application of large artificial intelligence models in radiology;Meta-Radiology;2024-06