Assessing and Mitigating Bias in Medical Artificial Intelligence

Author:

Noseworthy Peter A.12,Attia Zachi I.1,Brewer LaPrincess C.1,Hayes Sharonne N.13,Yao Xiaoxi124,Kapa Suraj1,Friedman Paul A.1,Lopez-Jimenez Francisco1ORCID

Affiliation:

1. Department of Cardiovascular Medicine (P.A.N., Z.I.A., L.C.B., S.N.H., X.Y., S.K., P.A.F., F.L.-J.), Mayo Clinic, Rochester, MN.

2. Robert D. and Patricia E. Kern Center for the Science of Health Care Delivery (P.A.N., X.Y.), Mayo Clinic, Rochester, MN.

3. Office of Diversity and Inclusion (S.N.H.), Mayo Clinic, Rochester, MN.

4. Division of Health Care Policy and Research, Department of Health Sciences Research (X.Y.), Mayo Clinic, Rochester, MN.

Abstract

Background: Deep learning algorithms derived in homogeneous populations may be poorly generalizable and have the potential to reflect, perpetuate, and even exacerbate racial/ethnic disparities in health and health care. In this study, we aimed to (1) assess whether the performance of a deep learning algorithm designed to detect low left ventricular ejection fraction using the 12-lead ECG varies by race/ethnicity and to (2) determine whether its performance is determined by the derivation population or by racial variation in the ECG. Methods: We performed a retrospective cohort analysis that included 97 829 patients with paired ECGs and echocardiograms. We tested the model performance by race/ethnicity for convolutional neural network designed to identify patients with a left ventricular ejection fraction ≤35% from the 12-lead ECG. Results: The convolutional neural network that was previously derived in a homogeneous population (derivation cohort, n=44 959; 96.2% non-Hispanic white) demonstrated consistent performance to detect low left ventricular ejection fraction across a range of racial/ethnic subgroups in a separate testing cohort (n=52 870): non-Hispanic white (n=44 524; area under the curve [AUC], 0.931), Asian (n=557; AUC, 0.961), black/African American (n=651; AUC, 0.937), Hispanic/Latino (n=331; AUC, 0.937), and American Indian/Native Alaskan (n=223; AUC, 0.938). In secondary analyses, a separate neural network was able to discern racial subgroup category (black/African American [AUC, 0.84], and white, non-Hispanic [AUC, 0.76] in a 5-class classifier), and a network trained only in non-Hispanic whites from the original derivation cohort performed similarly well across a range of racial/ethnic subgroups in the testing cohort with an AUC of at least 0.930 in all racial/ethnic subgroups. Conclusions: Our study demonstrates that while ECG characteristics vary by race, this did not impact the ability of a convolutional neural network to predict low left ventricular ejection fraction from the ECG. We recommend reporting of performance among diverse ethnic, racial, age, and sex groups for all new artificial intelligence tools to ensure responsible use of artificial intelligence in medicine.

Publisher

Ovid Technologies (Wolters Kluwer Health)

Subject

Physiology (medical),Cardiology and Cardiovascular Medicine

Reference18 articles.

1. Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs

2. Stand-Alone Artificial Intelligence for Breast Cancer Detection in Mammography: Comparison With 101 Radiologists

3. Machine learning based on multi-parametric magnetic resonance imaging to differentiate glioblastoma multiforme from primary cerebral nervous system lymphoma

4. Abrams C. Google’s effort to prevent blindness shows AI challenges. The Wall Street Journal. January 26 2019. https://www.wsj.com/articles/googles-effort-to-prevent-blindness-hits-roadblock-11548504004. Accessed May 30 2019.

5. Hardesty L. Study finds gender and skin-type bias in commercial artificial-intelligence systems. MIT news office. February 11 2018. http://news.mit.edu/2018/study-finds-gender-skin-type-bias-artificial-intelligence-systems-0212. Accessed May 30 2019.

Cited by 139 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3