Large Language Model Influence on Management Reasoning: A Randomized Controlled Trial

Author:

Goh EthanORCID,Gallo Robert,Strong Eric,Weng Yingjie,Kerman Hannah,Freed Jason,Cool Joséphine A.,Kanjee Zahir,Lane Kathleen P.,Parsons Andrew S.,Ahuja Neera,Horvitz Eric,Yang Daniel,Milstein Arnold,Olson Andrew P.J,Hom Jason,Chen Jonathan H,Rodman Adam

Abstract

AbstractImportanceLarge language model (LLM) artificial intelligence (AI) systems have shown promise in diagnostic reasoning, but their utility in management reasoning with no clear right answers is unknown.ObjectiveTo determine whether LLM assistance improves physician performance on open-ended management reasoning tasks compared to conventional resources.DesignProspective, randomized controlled trial conducted from 30 November 2023 to 21 April 2024.SettingMulti-institutional study from Stanford University, Beth Israel Deaconess Medical Center, and the University of Virginia involving physicians from across the United States.Participants92 practicing attending physicians and residents with training in internal medicine, family medicine, or emergency medicine.InterventionFive expert-developed clinical case vignettes were presented with multiple open-ended management questions and scoring rubrics created through a Delphi process. Physicians were randomized to use either GPT-4 via ChatGPT Plus in addition to conventional resources (e.g., UpToDate, Google), or conventional resources alone.Main Outcomes and MeasuresThe primary outcome was difference in total score between groups on expert-developed scoring rubrics. Secondary outcomes included domain-specific scores and time spent per case.ResultsPhysicians using the LLM scored higher compared to those using conventional resources (mean difference 6.5 %, 95% CI 2.7-10.2, p<0.001). Significant improvements were seen in management decisions (6.1%, 95% CI 2.5-9.7, p=0.001), diagnostic decisions (12.1%, 95% CI 3.1-21.0, p=0.009), and case-specific (6.2%, 95% CI 2.4-9.9, p=0.002) domains. GPT-4 users spent more time per case (mean difference 119.3 seconds, 95% CI 17.4-221.2, p=0.02). There was no significant difference between GPT-4-augmented physicians and GPT-4 alone (-0.9%, 95% CI -9.0 to 7.2, p=0.8).Conclusions and RelevanceLLM assistance improved physician management reasoning compared to conventional resources, with particular gains in contextual and patient-specific decision-making. These findings indicate that LLMs can augment management decision-making in complex cases.Trial RegistrationClinicalTrials.govIdentifier:NCT06208423;https://classic.clinicaltrials.gov/ct2/show/NCT06208423Key PointsQuestionDoes large language model (LLM) assistance improve physician performance on complex management reasoning tasks compared to conventional resources?FindingsIn this randomized controlled trial of 92 physicians, participants using GPT-4 achieved higher scores on management reasoning compared to those using conventional resources (e.g., UpToDate).MeaningLLM assistance enhances physician management reasoning performance in complex cases with no clear right answers.

Publisher

Cold Spring Harbor Laboratory

Reference29 articles.

1. Accuracy of a Generative Artificial Intelligence Model in a Complex Diagnostic Challenge

2. Clinical Reasoning of a Generative Artificial Intelligence Model Compared With Physicians

3. Tu T , Palepu A , Schaekermann M , et al. Towards Conversational Diagnostic AI.

4. McDuff D , Schaekermann M , Tu T , et al. Towards Accurate Differential Diagnosis with Large Language Models.

5. Influence of a Large Language Model on Diagnostic Reasoning: A Randomized Clinical Vignette Study

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3