Diagnostic Performance of Generative AI and Physicians: A Systematic Review and Meta-Analysis

Author:

Takita HirotakaORCID,Walston Shannon LORCID,Tatekawa HiroyukiORCID,Saito KenichiORCID,Tsujimoto YasushiORCID,Miki YukioORCID,Ueda DaijuORCID

Abstract

AbstractBackgroundThe rapid advancement of generative artificial intelligence (AI) has revolutionized understanding and generation of human language. Their integration into healthcare has shown potential for improving medical diagnostics, yet a comprehensive diagnostic performance evaluation of generative AI models and the comparison of their diagnostic performance with that of physicians has not been extensively explored.MethodsIn this systematic review and meta-analysis, a comprehensive search of Medline, Scopus, Web of Science, Cochrane Central, and medRxiv was conducted for studies published from June 2018 through December 2023, focusing on those that validate generative AI models for diagnostic tasks. Meta-analysis was performed to summarize the performance of the models and to compare the accuracy of the models with that of physicians. The quality of studies was assessed using the Prediction Model Study Risk of Bias Assessment Tool.ResultsThe search resulted in 54 studies being included in the meta-analysis, with 13 of these also used in the comparative analysis. Eight models were evaluated across 17 medical specialties. The overall accuracy for generative AI models across 54 studies was 57% (95% confidence interval [CI]: 51–63%). The I-squared statistic of 96% signifies a high degree of heterogeneity among the study results. Meta-regression analysis of generative AI models revealed significantly improved accuracy for GPT-4, and reduced accuracy for some specialties such as Neurology, Endocrinology, Rheumatology, and Radiology. The comparison meta-analysis demonstrated that, on average, physicians exceeded the accuracy of the models (difference in accuracy: 14% [95% CI: 8–19%], p-value <0.001). However, in the performance comparison between GPT-4 and physicians, GPT-4 performed slightly higher than non-experts (–4% [95% CI: –10–2%], p-value = 0.173), and slightly underperformed compared to experts (6% [95% CI: –1–13%], p-value = 0.091). The quality assessment indicated a high risk of bias in the majority of studies, primarily due to small sample sizes.ConclusionsGenerative AI exhibits promising diagnostic capabilities, with accuracy varying significantly by model and medical specialty. Although they have not reached the reliability of expert physicians, the findings suggest that generative AI models have the potential to enhance healthcare delivery and medical education, provided they are integrated with caution and their limitations are well-understood. This study also highlights the need for more rigorous research standards and a larger number of cases in the future.

Publisher

Cold Spring Harbor Laboratory

Reference78 articles.

1. Radford A , Narasimhan K , Salimans T , Sutskever I . Improving language understanding by generative pre-training [Internet]. [cited 2023 Dec 26];Available from: https://www.mikecaptain.com/resources/pdf/GPT-1.pdf

2. Radford A , Wu J , Child R , Luan D , Amodei D , Sutskever I . Language Models are Unsupervised Multitask Learners [Internet]. [cited 2023 Dec 26];Available from: https://insightcivic.s3.us-east-1.amazonaws.com/language-models.pdf

3. Language models are few-shot learners;Adv Neural Inf Process Syst,2020

4. OpenAI,:, Achiam J , et al. GPT-4 Technical Report [Internet]. arXiv [cs.CL]. 2023; Available from: http://arxiv.org/abs/2303.08774

5. Touvron H , Lavril T , Izacard G , et al. LLaMA: Open and Efficient Foundation Language Models [Internet]. arXiv [cs.CL]. 2023; Available from: http://arxiv.org/abs/2302.13971

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3