Evaluating ChatGPT-4V in chest CT diagnostics: a critical image interpretation assessment-Reference-Cited by-同舟云学术

Evaluating ChatGPT-4V in chest CT diagnostics: a critical image interpretation assessment

Published:2024-06-13 Issue: Volume: Page:
ISSN:1867-1071
Container-title:Japanese Journal of Radiology
language:en
Short-container-title:Jpn J Radiol

Author:

Dehdab Reza^ORCID,Brendlin Andreas,Werner Sebastian,Almansour Haidara,Gassenmaier Sebastian,Brendel Jan Michael,Nikolaou Konstantin,Afat Saif

Abstract

Abstract Purpose To assess the diagnostic accuracy of ChatGPT-4V in interpreting a set of four chest CT slices for each case of COVID-19, non-small cell lung cancer (NSCLC), and control cases, thereby evaluating its potential as an AI tool in radiological diagnostics. Materials and methods In this retrospective study, 60 CT scans from The Cancer Imaging Archive, covering COVID-19, NSCLC, and control cases were analyzed using ChatGPT-4V. A radiologist selected four CT slices from each scan for evaluation. ChatGPT-4V’s interpretations were compared against the gold standard diagnoses and assessed by two radiologists. Statistical analyses focused on accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV), along with an examination of the impact of pathology location and lobe involvement. Results ChatGPT-4V showed an overall diagnostic accuracy of 56.76%. For NSCLC, sensitivity was 27.27% and specificity was 60.47%. In COVID-19 detection, sensitivity was 13.64% and specificity of 64.29%. For control cases, the sensitivity was 31.82%, with a specificity of 95.24%. The highest sensitivity (83.33%) was observed in cases involving all lung lobes. The chi-squared statistical analysis indicated significant differences in Sensitivity across categories and in relation to the location and lobar involvement of pathologies. Conclusion ChatGPT-4V demonstrated variable diagnostic performance in chest CT interpretation, with notable proficiency in specific scenarios. This underscores the challenges of cross-modal AI models like ChatGPT-4V in radiology, pointing toward significant areas for improvement to ensure dependability. The study emphasizes the importance of enhancing these models for broader, more reliable medical use.

Funder

Universitätsklinikum Tübingen

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s11604-024-01606-3.pdf

Reference38 articles.

1. Dalla PL. Tomorrow’s radiologist: what future? Radiol Med (Torino). 2006;111(5):621–33. https://doi.org/10.1007/S11547-006-0060-1.

2. Jorritsma W, Cnossen F, Van Ooijen PMA. Improving the radiologist-CAD interaction: designing for appropriate trust. Clin Radiol. 2015;70(2):115–22. https://doi.org/10.1016/J.CRAD.2014.09.017.

3. Rajpurkar P, Irvin J, Ball RL, Zhu K, Yang B, Mehta H, et al. Deep learning for chest radiograph diagnosis: a retrospective comparison of the CheXNeXt algorithm to practicing radiologists. PLoS Med. 2018;15(11): e1002686. https://doi.org/10.1371/JOURNAL.PMED.1002686.

4. Hwang EJ, Park S, Jin KN, Kim JI, Choi SY, Lee JH, et al. Development and validation of a deep learning-based automated detection algorithm for major thoracic diseases on chest radiographs. JAMA Netw Open. 2019;2(3): e191095. https://doi.org/10.1001/JAMANETWORKOPEN.2019.1095.

5. Nam JG, Park S, Hwang EJ, Lee JH, Jin KN, Lim KY, et al. Development and validation of deep learning-based automatic detection algorithm for malignant pulmonary nodules on chest radiographs. Radiology. 2019;290(1):218–28. https://doi.org/10.1148/RADIOL.2018180237.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Assessing ChatGPT-4's Proficiency in English College Entrance Examinations Using Web Raschonline: A Comparative Study (Preprint);2024-07-19