BACKGROUND
Various machine learning (ML) prediction models have recently been developed for cardiovascular disease (CVD) in type 2 diabetes mellitus (T2DM); however, the lack of multiple risk factors limits their predictive power.
OBJECTIVE
This study aimed to evaluate the validity and usefulness of an ML model for predicting the 3-year incidence of CVD in patients with T2DM.
METHODS
We used data from two independent cohorts, the discovery cohort (one hospital; n=12,809) and the validation cohort (two hospitals; n=2019), to predict CVD. The outcome of interest was the presence/absence of CVD at 3 years. We selected various ML-based models with hyperparameter tuning in the discovery cohort and performed an area under the receiver operating characteristic curve (AUROC) analysis in the validation cohort.
RESULTS
The study dataset included 12,809 (discovery) and 2,019 (validation) patients with T2DM recruited between 2008–2022. CVD was observed in 1,238 (10.2%) patients in the discovery cohort. The random forest (RF) model had a mean AUROC of 0.830 (95% confidence interval 0.816–0.845) in the discovery dataset. Applying this result to the extra-validation dataset revealed the best performance among the models, with an AUROC of 0.72 (accuracy of 65.4%, sensitivity of 66.0%, specificity of 65.4%, and balanced accuracy of 65.7%). Creatinine and glycated hemoglobin levels were the most influential factors in the RF model.
CONCLUSIONS
This study demonstrates the usefulness and feasibility of ML for assessing CVD incidence in patients with T2DM and suggests its potential for use in patient screening. Further international studies are required to validate our findings.