Affiliation:
1. Epidemiology and Statistics, School of Public Health, Jilin University
2. School of Mathematics and Statistics, Northeast Normal University, Changchun, Jilin, China
Abstract
Objective
The number of heart disease patients is increasing. Establishing a risk assessment model for chronic heart disease (CHD) based on risk factors is beneficial for early diagnosis and timely treatment of high-risk populations.
Methods
Four machine learning models, including logistic regression, support vector machines (SVM), random forests, and extreme gradient boosting (XGBoost), were used to evaluate the CHD among 14 971 participants in the National Health and Nutrition Examination Survey from 2011 to 2018. The area under the receiver-operator curve (AUC) is the indicator that we evaluate the model.
Results
In four kinds of models, SVM has the best classification performance (AUC = 0.898), and the AUC value of logistic regression and random forest were 0.895 and 0.894, respectively. Although XGBoost performed the worst with an AUC value of 0.891. There was no significant difference among the four algorithms. In the importance analysis of variables, the three most important variables were taking low-dose aspirin, chest pain or discomfort, and total amount of dietary supplements taken.
Conclusion
All four machine learning classifiers can identify the occurrence of CHD based on population survey data. We also determined the contribution of variables in the prediction, which can further explore their effectiveness in actual clinical data.
Publisher
Ovid Technologies (Wolters Kluwer Health)
Subject
Cardiology and Cardiovascular Medicine,General Medicine