The Impact of Time Horizon on Classification Accuracy: Application of Machine Learning to Prediction of Incident Coronary Heart Disease (Preprint)

Author:

Simon StevenORCID,Mandair Divneet,Albakri Abdel,Fohner Alison,Simon Noah,Lange Leslie,Biggs Mary,Mukamal Ken,Psaty Bruce,Rosenberg MichaelORCID

Abstract

BACKGROUND

Many machine-learning (ML) approaches are limited to classification of outcomes, rather than longitudinal prediction. One strategy to use ML in clinical risk prediction is to classify outcomes over a given time horizon. However, it is not well-known how to identify the optimal time horizon for risk prediction.

OBJECTIVE

Here we aim to identify an optimal time horizon for classification of incident myocardial infarction using ML approaches looped over outcomes with increasing time horizons.

METHODS

We analyzed data from a single clinic visit of 5201 participants of the Cardiovascular Health Study. We examined 61 variables collected from this baseline exam including demographic and biologic data, medical history, medications, serum biomarkers, electrocardiographic, and echocardiographic data. We compared several machine learning methods (Random Forest, L1 Regression, Gradient Boosted Decision Tree, Support Vector Machines, and K-Nearest Neighbor) trained to predict incident MI that occurred within time horizons ranging from 500 through 10000 days of follow up. Models were compared on a 20% held-out testing set using area-under-receiver operator curve (AUC). Variable importance was performed for Random Forest and L1 Regression models across timepoints. We compared results with the Framingham coronary heart disease sex-specific Cox proportional hazards regression functions.

RESULTS

There were 4190 participants included in the analysis with 60.2% female and an average age of 72.6 years. Over the 10000 days of follow up, there were 813 incident myocardial infarction events. The ML models were most predictive over moderate follow up time horizons (1500-2500 days). Overall, the L1 (Lasso) logistic regression demonstrated the strongest classification accuracy across all time horizons. This model was most predictive at 1500 days follow up with an AUC of 0.71. The most influential variables differed by follow up time and model with gender being the most important feature for the L1 regression and weight for the random forest across all timeframes. Compared with the Framingham Cox function, the L1 and random forest models performed better across all timeframes beyond 1500 days.

CONCLUSIONS

In a population free of coronary heart disease, machine learning techniques can be utilized to predict incident myocardial infarction at varying time horizons with reasonable accuracy, with strongest prediction accuracy at moderate follow up periods. Validation across additional populations is needed to confirm a role for this approach in risk prediction.

Publisher

JMIR Publications Inc.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3