BACKGROUND
Machine learning (ML) risk prediction models, although much more accurate than traditional statistical methods, are inconvenient to use in clinical practice due to their nontransparency and requirement of a large number of input variables.
OBJECTIVE
We aimed to develop a precise, explainable, and flexible ML model to predict the risk of in-hospital mortality in patients with ST-segment elevation myocardial infarction (STEMI).
METHODS
This study recruited 18,744 patients enrolled in the 2013 China Acute Myocardial Infarction (CAMI) registry and 12,018 patients from the China Patient-Centered Evaluative Assessment of Cardiac Events (PEACE)-Retrospective Acute Myocardial Infarction Study. The Extreme Gradient Boosting (XGBoost) model was derived from 9616 patients in the CAMI registry (2014, 89 variables) with 5-fold cross-validation and validated on both the 9125 patients in the CAMI registry (89 variables) and the independent China PEACE cohort (10 variables). The Shapley Additive Explanations (SHAP) approach was employed to interpret the complex relationships embedded in the proposed model.
RESULTS
In the XGBoost model for predicting all-cause in-hospital mortality, the variables with the top 8 most important scores were age, left ventricular ejection fraction, Killip class, heart rate, creatinine, blood glucose, white blood cell count, and use of angiotensin-converting enzyme inhibitors (ACEIs) and angiotensin II receptor blockers (ARBs). The area under the curve (AUC) on the CAMI validation set was 0.896 (95% CI 0.884-0.909), significantly higher than the previous models. The AUC for the Global Registry of Acute Coronary Events (GRACE) model was 0.809 (95% CI 0.790-0.828), and for the TIMI model, it was 0.782 (95% CI 0.763-0.800). Despite the China PEACE validation set only having 10 available variables, the AUC reached 0.840 (0.829-0.852), showing a substantial improvement to the GRACE (0.762, 95% CI 0.748-0.776) and TIMI (0.789, 95% CI 0.776-0.803) scores. Several novel and nonlinear relationships were discovered between patients’ characteristics and in-hospital mortality, including a U-shape pattern of high-density lipoprotein cholesterol (HDL-C).
CONCLUSIONS
The proposed ML risk prediction model was highly accurate in predicting in-hospital mortality. Its flexible and explainable characteristics make the model convenient to use in clinical practice and could help guide patient management.
CLINICALTRIAL
ClinicalTrials.gov NCT01874691; https://clinicaltrials.gov/study/NCT01874691