Affiliation:
1. the First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital
Abstract
Abstract
Objectives
An accurate prediction model for hyperuricemia (HUA) is urgently needed. This study aimed to develop a stacking ensemble prediction model for the risk of hyperuricemia and to identify the contributing risk factors.
Methods
A prospective health checkup cohort of 40899 subjects was examined and randomly divided into the training and validation sets with the ratio of 7:3, and then the ROSE sampling technique was used to handle the imbalanced classes. LASSO regression was employed to screen out important predicting features. An ensemble model using stacking strategy was constructed based on three individual models, including Support Vector Machine (SVM), Decision Tree C5.0 (C5.0), and eXtreme Gradient Boosting (XGBoost). Model validations were conducted using the area under the receiver operating characteristic curve (AUC) and the calibration curve, as well as metrics including accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and F1 score on both the validation set and the extra-validation set. The iBreakdown algorithm was used to illustrate the black-box nature of our ensemble model, and to identify contributing risk factors.
Results
Fifteen important features were screened out of 23 clinical variables. Our stacking ensemble model with an AUC of 0.854, outperformed the other three models, SVM, C5.0, and XGBoost with AUCs of 0.848, 0.851 and 0.849 respectively. Calibration accuracy as well as other metrics including accuracy, specificity, NPV, and F1 score were also proved our ensemble model’s superiority over the other three models. The contributing risk factors were estimated using six randomly selected subjects, which showed that being female and relatively younger, together with having higher BUA, BMI, GGT, TP, TG, Cr, and FBG values can increase the risk of HUA. To further validate our model’s applicability in the health checkup population, we used another cohort of 8559 subjects that also showed our ensemble prediction model had favorable performances with an AUC of 0.846.
Conclusions
In this study, the stacking ensemble prediction model for the risk of HUA was developed, which outperformed the individual machine-learning models that compose it, and the contributing risk factors were identified with insightful ideas.
Publisher
Research Square Platform LLC
Reference38 articles.
1. Liu R, Han C, Wu D, Xia X, Gu J, Guan H, Shan Z, Teng W. Prevalence of Hyperuricemia and Gout in Mainland China from 2000 to 2014: A Systematic Review and Meta-Analysis. Biomed Res Int 2015, 2015:762820.
2. Hyperuricemia and Risk of Cardiovascular Outcomes: The Experience of the URRAH (Uric Acid Right for Heart Health) Project;Maloberti A;High Blood Press Cardiovasc Prev,2020
3. The Prevalence and Risk Factors of Acute Cardiovascular Events in China: Findings from China Chronic Disease Risk Factor Surveillance 2010;Wang LM;Heart,2013
4. Zhou ZH. Ensemble learning. Machine Learning. edn. Singapore: Springer; 2021: 181–210.
5. Sugiyama M. Ensemble learning. Introduction to Statistical Machine Learning. edn.: Elsevier; 2016: 343–54.