Radiological Differential Diagnoses Based on Cardiovascular and Thoracic Imaging Patterns: Perspectives of Four Large Language Models

Author:

Sarangi Pradosh Kumar1ORCID,Irodi Aparna2ORCID,Panda Swaha3ORCID,Nayak Debasish Swapnesh Kumar4ORCID,Mondal Himel5ORCID

Affiliation:

1. Department of Radiodiagnosis, All India Institute of Medical Sciences, Deoghar, Jharkhand, India

2. Department of Radiodiagnosis, Christian Medical College and Hospital, Vellore, Tamil Nadu, India

3. Department of Otorhinolaryngology and Head and Neck Surgery, All India Institute of Medical Sciences, Deoghar, Jharkhand, India

4. Department of Computer Science and Engineering, Siksha ‘O’ Anusandhan (Deemed to be) University, Bhubaneswar, Odisha, India

5. Department of Physiology, All India Institute of Medical Sciences, Deoghar, Jharkhand, India

Abstract

Abstract Background Differential diagnosis in radiology is a critical aspect of clinical decision-making. Radiologists in the early stages may find difficulties in listing the differential diagnosis from image patterns. In this context, the emergence of large language models (LLMs) has introduced new opportunities as these models have the capacity to access and contextualize extensive information from text-based input. Objective The objective of this study was to explore the utility of four LLMs—ChatGPT3.5, Google Bard, Microsoft Bing, and Perplexity—in providing most important differential diagnoses of cardiovascular and thoracic imaging patterns. Methods We selected 15 unique cardiovascular (n = 5) and thoracic (n = 10) imaging patterns. We asked each model to generate top 5 most important differential diagnoses for every pattern. Concurrently, a panel of two cardiothoracic radiologists independently identified top 5 differentials for each case and came to consensus when discrepancies occurred. We checked the concordance and acceptance of LLM-generated differentials with the consensus differential diagnosis. Categorical variables were compared by binomial, chi-squared, or Fisher's exact test. Results A total of 15 cases with five differentials generated a total of 75 items to analyze. The highest level of concordance was observed for diagnoses provided by Perplexity (66.67%), followed by ChatGPT (65.33%) and Bing (62.67%). The lowest score was for Bard with 45.33% of concordance with expert consensus. The acceptance rate was highest for Perplexity (90.67%), followed by Bing (89.33%) and ChatGPT (85.33%). The lowest acceptance rate was for Bard (69.33%). Conclusion Four LLMs—ChatGPT3.5, Google Bard, Microsoft Bing, and Perplexity—generated differential diagnoses had high level of acceptance but relatively lower concordance. There were significant differences in acceptance and concordance among the LLMs. Hence, it is important to carefully select the suitable model for usage in patient care or in medical education.

Publisher

Georg Thieme Verlag KG

Subject

Radiology, Nuclear Medicine and imaging

Reference15 articles.

1. Modern diagnostic imaging technique applications and risk factors in the medical field: a review;S Hussain;BioMed Res Int,2022

2. Large language models (LLM) and ChatGPT: what will the impact on nuclear medicine be?;I L Alberts;Eur J Nucl Med Mol Imaging,2023

3. ChatGPT and the rise of large language models: the new AI-driven infodemic threat in public health;L De Angelis;Front Public Health,2023

4. Large language models in hematology case solving: a comparative study of ChatGPT-3.5, Google Bard, and Microsoft Bing;A Kumari;Cureus,2023

5. Large language models in medicine;A J Thirunavukarasu;Nat Med,2023

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3