Measuring the Impact of AI in the Diagnosis of Hospitalized Patients

Author:

Jabbour Sarah1,Fouhey David123,Shepard Stephanie1,Valley Thomas S.4,Kazerooni Ella A.5,Banovic Nikola1,Wiens Jenna1,Sjoding Michael W.4

Affiliation:

1. Computer Science and Engineering, University of Michigan, Ann Arbor

2. Now with Computer Science Courant Institute, New York University, New York

3. Now with Electrical and Computer Engineering Tandon School of Engineering, New York University, New York

4. Pulmonary and Critical Care Medicine, Department of Internal Medicine, University of Michigan Medical School, Ann Arbor

5. Department of Radiology, University of Michigan Medical School, Ann Arbor

Abstract

ImportanceArtificial intelligence (AI) could support clinicians when diagnosing hospitalized patients; however, systematic bias in AI models could worsen clinician diagnostic accuracy. Recent regulatory guidance has called for AI models to include explanations to mitigate errors made by models, but the effectiveness of this strategy has not been established.ObjectivesTo evaluate the impact of systematically biased AI on clinician diagnostic accuracy and to determine if image-based AI model explanations can mitigate model errors.Design, Setting, and ParticipantsRandomized clinical vignette survey study administered between April 2022 and January 2023 across 13 US states involving hospitalist physicians, nurse practitioners, and physician assistants.InterventionsClinicians were shown 9 clinical vignettes of patients hospitalized with acute respiratory failure, including their presenting symptoms, physical examination, laboratory results, and chest radiographs. Clinicians were then asked to determine the likelihood of pneumonia, heart failure, or chronic obstructive pulmonary disease as the underlying cause(s) of each patient’s acute respiratory failure. To establish baseline diagnostic accuracy, clinicians were shown 2 vignettes without AI model input. Clinicians were then randomized to see 6 vignettes with AI model input with or without AI model explanations. Among these 6 vignettes, 3 vignettes included standard-model predictions, and 3 vignettes included systematically biased model predictions.Main Outcomes and MeasuresClinician diagnostic accuracy for pneumonia, heart failure, and chronic obstructive pulmonary disease.ResultsMedian participant age was 34 years (IQR, 31-39) and 241 (57.7%) were female. Four hundred fifty-seven clinicians were randomized and completed at least 1 vignette, with 231 randomized to AI model predictions without explanations, and 226 randomized to AI model predictions with explanations. Clinicians’ baseline diagnostic accuracy was 73.0% (95% CI, 68.3% to 77.8%) for the 3 diagnoses. When shown a standard AI model without explanations, clinician accuracy increased over baseline by 2.9 percentage points (95% CI, 0.5 to 5.2) and by 4.4 percentage points (95% CI, 2.0 to 6.9) when clinicians were also shown AI model explanations. Systematically biased AI model predictions decreased clinician accuracy by 11.3 percentage points (95% CI, 7.2 to 15.5) compared with baseline and providing biased AI model predictions with explanations decreased clinician accuracy by 9.1 percentage points (95% CI, 4.9 to 13.2) compared with baseline, representing a nonsignificant improvement of 2.3 percentage points (95% CI, −2.7 to 7.2) compared with the systematically biased AI model.Conclusions and RelevanceAlthough standard AI models improve diagnostic accuracy, systematically biased AI models reduced diagnostic accuracy, and commonly used image-based AI model explanations did not mitigate this harmful effect.Trial RegistrationClinicalTrials.gov Identifier: NCT06098950

Publisher

American Medical Association (AMA)

Subject

General Medicine

Reference37 articles.

1. Human-computer collaboration for skin cancer recognition.;Tschandl;Nat Med,2020

2. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs.;Gulshan;JAMA,2016

3. Deep learning in histopathology: the path to the clinic.;van der Laak;Nat Med,2021

4. Multi-class texture analysis in colorectal cancer histology.;Kather;Sci Rep,2016

5. Deep learning applied to chest x-rays: exploiting and preventing shortcuts.;Jabbour;Proc Mach Learn Res,2020

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Artificial intelligence and machine learning in hemostasis and thrombosis;Bleeding, Thrombosis and Vascular Biology;2024-01-31

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3