Assessing the Performance of a New Artificial Intelligence–Driven Diagnostic Support Tool Using Medical Board Exam Simulations: Clinical Vignette Study

Author:

Ben-Shabat NivORCID,Sloma ArielORCID,Weizman TomerORCID,Kiderman DavidORCID,Amital HowardORCID

Abstract

Background Diagnostic decision support systems (DDSS) are computer programs aimed to improve health care by supporting clinicians in the process of diagnostic decision-making. Previous studies on DDSS demonstrated their ability to enhance clinicians’ diagnostic skills, prevent diagnostic errors, and reduce hospitalization costs. Despite the potential benefits, their utilization in clinical practice is limited, emphasizing the need for new and improved products. Objective The aim of this study was to conduct a preliminary analysis of the diagnostic performance of “Kahun,” a new artificial intelligence-driven diagnostic tool. Methods Diagnostic performance was evaluated based on the program’s ability to “solve” clinical cases from the United States Medical Licensing Examination Step 2 Clinical Skills board exam simulations that were drawn from the case banks of 3 leading preparation companies. Each case included 3 expected differential diagnoses. The cases were entered into the Kahun platform by 3 blinded junior physicians. For each case, the presence and the rank of the correct diagnoses within the generated differential diagnoses list were recorded. Each diagnostic performance was measured in two ways: first, as diagnostic sensitivity, and second, as case-specific success rates that represent diagnostic comprehensiveness. Results The study included 91 clinical cases with 78 different chief complaints and a mean number of 38 (SD 8) findings for each case. The total number of expected diagnoses was 272, of which 174 were different (some appeared more than once). Of the 272 expected diagnoses, 231 (87.5%; 95% CI 76-99) diagnoses were suggested within the top 20 listed diagnoses, 209 (76.8%; 95% CI 66-87) were suggested within the top 10, and 168 (61.8%; 95% CI 52-71) within the top 5. The median rank of correct diagnoses was 3 (IQR 2-6). Of the 91 expected diagnoses, 62 (68%; 95% CI 59-78) of the cases were suggested within the top 20 listed diagnoses, 44 (48%; 95% CI 38-59) within the top 10, and 24 (26%; 95% CI 17-35) within the top 5. Of the 91 expected diagnoses, in 87 (96%; 95% CI 91-100), at least 2 out of 3 of the cases’ expected diagnoses were suggested within the top 20 listed diagnoses; 78 (86%; 95% CI 79-93) were suggested within the top 10; and 61 (67%; 95% CI 57-77) within the top 5. Conclusions The diagnostic support tool evaluated in this study demonstrated good diagnostic accuracy and comprehensiveness; it also had the ability to manage a wide range of clinical findings.

Publisher

JMIR Publications Inc.

Subject

Health Information Management,Health Informatics

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3