Abstract
ABSTRACTPurposeTo investigate and compare the diagnostic performance of ChatGPT 3.5, Google Bard, Microsoft Bing, and two board-certified radiologists in thoracic radiology cases published by The Society of Thoracic Radiology.Materials and MethodsWe collected 124 “Case of the Month” from the Society of Thoracic Radiology website between March 2012 and December 2023. Medical history and imaging findings were input into ChatGPT 3.5, Google Bard, and Microsoft Bing for diagnosis and differential diagnosis. Two board-certified radiologists provided their diagnoses. Cases were categorized anatomically (parenchyma, airways, mediastinum-pleura-chest wall, and vascular) and further classified as specific or non-specific for radiological diagnosis. Diagnostic accuracy and differential diagnosis scores were analyzed using chi-square, Kruskal-Wallis and Mann-Whitney U tests.ResultsAmong 124 cases, ChatGPT demonstrated the highest diagnostic accuracy (53.2%), outperforming radiologists (52.4% and 41.1%), Bard (33.1%), and Bing (29.8%). Specific cases revealed varying diagnostic accuracies, with Radiologist I achieving (65.6%), surpassing ChatGPT (63.5%), Radiologist II (52.0%), Bard (39.5%), and Bing (35.4%). ChatGPT 3.5 and Bing had higher differential scores in specific cases (P<0.05), whereas Bard did not (P=0.114). All three had a higher diagnostic accuracy in specific cases (P<0.05). No differences were found in the diagnostic accuracy or differential diagnosis scores of the four anatomical location (P>0.05).ConclusionChatGPT 3.5 demonstrated higher diagnostic accuracy than Bing, Bard and radiologists in text-based thoracic radiology cases. Large language models hold great promise in this field under proper medical supervision.
Publisher
Cold Spring Harbor Laboratory
Reference23 articles.
1. Large language models in medicine;Nat Med,2023
2. Comparing the Efficacy of Large Language Models ChatGPT, BARD, and Bing AI in Providing Information on Rhinoplasty: An Observational Study;Aesthet Surg J Open Forum,2023
3. ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns;Healthcare (Basel,2023
4. Bera K , O’Connor G , Jiang S , et al. Analysis of ChatGPT publications in radiology: Literature so far. Curr Probl Diagn Radiol. Published online October 20, 2023.
5. Evaluation of an Artificial Intelligence Chatbot for Delivery of IR Patient Education Material: A Comparison with Societal Website Content;J Vasc Interv Radiol,2023
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献