Preliminary assessment of automated radiology report generation with generative pre-trained transformers: comparing results to radiologist-generated reports-Reference-Cited by-同舟云学术

Preliminary assessment of automated radiology report generation with generative pre-trained transformers: comparing results to radiologist-generated reports

Published:2023-09-15 Issue:2 Volume:42 Page:190-200
ISSN:1867-1071
Container-title:Japanese Journal of Radiology
language:en
Short-container-title:Jpn J Radiol

Author:

Nakaura Takeshi^ORCID,Yoshida Naofumi,Kobayashi Naoki,Shiraishi Kaori,Nagayama Yasunori,Uetani Hiroyuki,Kidoh Masafumi,Hokamura Masamichi,Funama Yoshinori,Hirai Toshinori

Abstract

Abstract Purpose In this preliminary study, we aimed to evaluate the potential of the generative pre-trained transformer (GPT) series for generating radiology reports from concise imaging findings and compare its performance with radiologist-generated reports. Methods This retrospective study involved 28 patients who underwent computed tomography (CT) scans and had a diagnosed disease with typical imaging findings. Radiology reports were generated using GPT-2, GPT-3.5, and GPT-4 based on the patient’s age, gender, disease site, and imaging findings. We calculated the top-1, top-5 accuracy, and mean average precision (MAP) of differential diagnoses for GPT-2, GPT-3.5, GPT-4, and radiologists. Two board-certified radiologists evaluated the grammar and readability, image findings, impression, differential diagnosis, and overall quality of all reports using a 4-point scale. Results Top-1 and Top-5 accuracies for the different diagnoses were highest for radiologists, followed by GPT-4, GPT-3.5, and GPT-2, in that order (Top-1: 1.00, 0.54, 0.54, and 0.21, respectively; Top-5: 1.00, 0.96, 0.89, and 0.54, respectively). There were no significant differences in qualitative scores about grammar and readability, image findings, and overall quality between radiologists and GPT-3.5 or GPT-4 (p > 0.05). However, qualitative scores of the GPT series in impression and differential diagnosis scores were significantly lower than those of radiologists (p < 0.05). Conclusions Our preliminary study suggests that GPT-3.5 and GPT-4 have the possibility to generate radiology reports with high readability and reasonable image findings from very short keywords; however, concerns persist regarding the accuracy of impressions and differential diagnoses, thereby requiring verification by radiologists.

Publisher

Springer Science and Business Media LLC

Subject

Radiology, Nuclear Medicine and imaging

Link

https://link.springer.com/content/pdf/10.1007/s11604-023-01487-y.pdf

Reference20 articles.

1. Hartung MP, Bickle IC, Gaillard F, Kanne JP. How to create a great radiology report. Radiographics. 2020;40:1658–70.

2. Parikh JR, Wolfman D, Bender CE, Arleo E. Radiologist burnout according to surveyed radiology practice leaders. J Am Coll Radiol. 2020;17:78–81.

3. Kitahara H, Nagatani Y, Otani H, Nakayama R, Kida Y, Sonoda A, et al. A novel strategy to develop deep learning for image super-resolution using original ultra-high-resolution computed tomography images of lung as training dataset. Jpn J Radiol. 2022;40:38–47.

4. Barat M, Chassagnon G, Dohan A, Gaujoux S, Coriat R, Hoeffel C, et al. Artificial intelligence: a critical review of current applications in pancreatic imaging. Jpn J Radiol. 2021;39:514–23.

5. Chassagnon G, De Margerie-Mellon C, Vakalopoulou M, Marini R, Hoang-Thi T-N, Revel M-P, et al. Artificial intelligence in lung cancer: current applications and perspectives. Jpn J Radiol. 2023;41:235–44.

Cited by 27 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Advancing radiology with GPT-4: Innovations in clinical applications, patient engagement, research, and learning;European Journal of Radiology Open;2024-12

2. Comparative analysis of GPT-4-based ChatGPT’s diagnostic performance with radiologists using real-world radiology reports of brain tumors;European Radiology;2024-08-28

3. Künstliche Intelligenz in der Medizin: Wo stehen wir heute, und was liegt vor uns?;Zeitschrift für Herz-,Thorax- und Gefäßchirurgie;2024-08-27

4. Multi-modal transformer architecture for medical image analysis and automated report generation;Scientific Reports;2024-08-20

5. DKA-RG: Disease-Knowledge-Enhanced Fine-Grained Image–Text Alignment for Automatic Radiology Report Generation;Electronics;2024-08-20