Outcome class imbalance and rare events: An underappreciated complication for overdose risk prediction modeling

Author:

Cartus Abigail R.1ORCID,Samuels Elizabeth A.12,Cerdá Magdalena3ORCID,Marshall Brandon D. L.1ORCID

Affiliation:

1. Department of Epidemiology Brown University School of Public Health Providence Rhode Island USA

2. Department of Emergency Medicine Alpert Medical School of Brown University Providence Rhode Island USA

3. Division of Epidemiology, Department of Population Health, Center for Opioid Epidemiology and Policy, School of Medicine New York University New York New York USA

Abstract

AbstractBackground and aimsLow outcome prevalence, often observed with opioid‐related outcomes, poses an underappreciated challenge to accurate predictive modeling. Outcome class imbalance, where non‐events (i.e. negative class observations) outnumber events (i.e. positive class observations) by a moderate to extreme degree, can distort measures of predictive accuracy in misleading ways, and make the overall predictive accuracy and the discriminatory ability of a predictive model appear spuriously high. We conducted a simulation study to measure the impact of outcome class imbalance on predictive performance of a simple SuperLearner ensemble model and suggest strategies for reducing that impact.Design, Setting, ParticipantsUsing a Monte Carlo design with 250 repetitions, we trained and evaluated these models on four simulated data sets with 100 000 observations each: one with perfect balance between events and non‐events, and three where non‐events outnumbered events by an approximate factor of 10:1, 100:1, and 1000:1, respectively.MeasurementsWe evaluated the performance of these models using a comprehensive suite of measures, including measures that are more appropriate for imbalanced data.FindingsIncreasing imbalance tended to spuriously improve overall accuracy (using a high threshold to classify events vs non‐events, overall accuracy improved from 0.45 with perfect balance to 0.99 with the most severe outcome class imbalance), but diminished predictive performance was evident using other metrics (corresponding positive predictive value decreased from 0.99 to 0.14).ConclusionIncreasing reliance on algorithmic risk scores in consequential decision‐making processes raises critical fairness and ethical concerns. This paper provides broad guidance for analytic strategies that clinical investigators can use to remedy the impacts of outcome class imbalance on risk prediction tools.

Funder

National Institute on Drug Abuse

Publisher

Wiley

Subject

Psychiatry and Mental health,Medicine (miscellaneous)

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3