Predictive performance of machine and statistical learning methods: Impact of data-generating processes on external validity in the “large N, small p” setting

Author:

Austin Peter C123ORCID,Harrell Frank E4,Steyerberg Ewout W56

Affiliation:

1. ICES, Toronto, ON, Canada

2. Department of Health Policy, Management and Evaluation, University of Toronto, Toronto, ON, Canada

3. Schulich Heart Research Program, Sunnybrook Research Institute, Toronto, ON, Canada

4. Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN, USA

5. Department of Public Health, Erasmus MC – University Medical Centre Rotterdam, Rotterdam, The Netherlands

6. Department of Biomedical Data Sciences, Leiden University Medical Centre, Leiden, The Netherlands

Abstract

Machine learning approaches are increasingly suggested as tools to improve prediction of clinical outcomes. We aimed to identify when machine learning methods perform better than a classical learning method. We hereto examined the impact of the data-generating process on the relative predictive accuracy of six machine and statistical learning methods: bagged classification trees, stochastic gradient boosting machines using trees as the base learners, random forests, the lasso, ridge regression, and unpenalized logistic regression. We performed simulations in two large cardiovascular datasets which each comprised an independent derivation and validation sample collected from temporally distinct periods: patients hospitalized with acute myocardial infarction (AMI, n = 9484 vs. n = 7000) and patients hospitalized with congestive heart failure (CHF, n = 8240 vs. n = 7608). We used six data-generating processes based on each of the six learning methods to simulate outcomes in the derivation and validation samples based on 33 and 28 predictors in the AMI and CHF data sets, respectively. We applied six prediction methods in each of the simulated derivation samples and evaluated performance in the simulated validation samples according to c-statistic, generalized R2, Brier score, and calibration. While no method had uniformly superior performance across all six data-generating process and eight performance metrics, (un)penalized logistic regression and boosted trees tended to have superior performance to the other methods across a range of data-generating processes and performance metrics. This study confirms that classical statistical learning methods perform well in low-dimensional settings with large data sets.

Funder

Canadian Institutes of Health Research

National Center for Advancing Translational Sciences

Heart and Stroke Foundation of Canada

Publisher

SAGE Publications

Subject

Health Information Management,Statistics and Probability,Epidemiology

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3