The Impact of Time Horizon on Classification Accuracy: Application of Machine Learning to Prediction of Incident Coronary Heart Disease-Reference-Cited by-同舟云学术

The Impact of Time Horizon on Classification Accuracy: Application of Machine Learning to Prediction of Incident Coronary Heart Disease

Published:2022-11-02 Issue:2 Volume:6 Page:e38040
ISSN:2561-1011
Container-title:JMIR Cardio
language:en
Short-container-title:JMIR Cardio

Author:

Simon Steven^ORCID,Mandair Divneet^ORCID,Albakri Abdel^ORCID,Fohner Alison^ORCID,Simon Noah^ORCID,Lange Leslie^ORCID,Biggs Mary^ORCID,Mukamal Kenneth^ORCID,Psaty Bruce^ORCID,Rosenberg Michael^ORCID

Abstract

Background Many machine learning approaches are limited to classification of outcomes rather than longitudinal prediction. One strategy to use machine learning in clinical risk prediction is to classify outcomes over a given time horizon. However, it is not well-known how to identify the optimal time horizon for risk prediction. Objective In this study, we aim to identify an optimal time horizon for classification of incident myocardial infarction (MI) using machine learning approaches looped over outcomes with increasing time horizons. Additionally, we sought to compare the performance of these models with the traditional Framingham Heart Study (FHS) coronary heart disease gender-specific Cox proportional hazards regression model. Methods We analyzed data from a single clinic visit of 5201 participants of a cardiovascular health study. We examined 61 variables collected from this baseline exam, including demographic and biologic data, medical history, medications, serum biomarkers, electrocardiographic, and echocardiographic data. We compared several machine learning methods (eg, random forest, L1 regression, gradient boosted decision tree, support vector machine, and k-nearest neighbor) trained to predict incident MI that occurred within time horizons ranging from 500-10,000 days of follow-up. Models were compared on a 20% held-out testing set using area under the receiver operating characteristic curve (AUROC). Variable importance was performed for random forest and L1 regression models across time points. We compared results with the FHS coronary heart disease gender-specific Cox proportional hazards regression functions. Results There were 4190 participants included in the analysis, with 2522 (60.2%) female participants and an average age of 72.6 years. Over 10,000 days of follow-up, there were 813 incident MI events. The machine learning models were most predictive over moderate follow-up time horizons (ie, 1500-2500 days). Overall, the L1 (Lasso) logistic regression demonstrated the strongest classification accuracy across all time horizons. This model was most predictive at 1500 days follow-up, with an AUROC of 0.71. The most influential variables differed by follow-up time and model, with gender being the most important feature for the L1 regression and weight for the random forest model across all time frames. Compared with the Framingham Cox function, the L1 and random forest models performed better across all time frames beyond 1500 days. Conclusions In a population free of coronary heart disease, machine learning techniques can be used to predict incident MI at varying time horizons with reasonable accuracy, with the strongest prediction accuracy in moderate follow-up periods. Validation across additional populations is needed to confirm the validity of this approach in risk prediction.

Publisher

JMIR Publications Inc.

Subject

Cardiology and Cardiovascular Medicine,Health Informatics

Reference29 articles.

1. Predicting Atrial Fibrillation and Its Complications

2. Development of a risk score for atrial fibrillation (Framingham Heart Study): a community-based cohort study

3. A Clinical Risk Score for Atrial Fibrillation in a Biracial Prospective Cohort (from the Atherosclerosis Risk In Communities [ARIC] Study)

4. Simple Risk Model Predicts Incidence of Atrial Fibrillation in a Racially and Geographically Diverse Population: the CHARGE‐AF Consortium

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Comparison of machine learning models to predict the risk of breast cancer-related lymphedema among breast cancer survivors: a cross-sectional study in China;Frontiers in Oncology;2024-02-12

2. Utility of prescription-based comorbidity indices for predicting mortality among Australian men with prostate cancer;Cancer Epidemiology;2024-02

3. Predicting Coronary Heart Disease Using an Improved LightGBM Model: Performance Analysis and Comparison;IEEE Access;2023