Affiliation:
1. School of Science and Technology, International Hellenic University, 57001 Thessaloniki, Greece
Abstract
Despite medical advancements in recent years, cardiovascular diseases (CVDs) remain a major factor in rising mortality rates, challenging predictions despite extensive expertise. The healthcare sector is poised to benefit significantly from harnessing massive data and the insights we can derive from it, underscoring the importance of integrating machine learning (ML) to improve CVD prevention strategies. In this study, we addressed the major issue of class imbalance in the Behavioral Risk Factor Surveillance System (BRFSS) 2021 heart disease dataset, including personal lifestyle factors, by exploring several resampling techniques, such as the Synthetic Minority Oversampling Technique (SMOTE), Adaptive Synthetic Sampling (ADASYN), SMOTE-Tomek, and SMOTE-Edited Nearest Neighbor (SMOTE-ENN). Subsequently, we trained, tested, and evaluated multiple classifiers, including logistic regression (LR), decision trees (DTs), random forest (RF), gradient boosting (GB), XGBoost (XGB), CatBoost, and artificial neural networks (ANNs), comparing their performance with a primary focus on maximizing sensitivity for CVD risk prediction. Based on our findings, the hybrid resampling techniques outperformed the alternative sampling techniques, and our proposed implementation includes SMOTE-ENN coupled with CatBoost optimized through Optuna, achieving a remarkable 88% rate for recall and 82% for the area under the receiver operating characteristic (ROC) curve (AUC) metric.
Reference43 articles.
1. World Health Organization (2023, June 26). Cardiovascular Diseases (CVDs), Available online: https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds).
2. Integrated Machine Learning Model for Comprehensive Heart Disease Risk Assessment Based on Multi-Dimensional Health Factors;Lupague;Eur. J. Comput. Sci. Inf. Technol.,2023
3. (2023, August 01). Cleveland Clinic Cardiovascular Disease. Available online: https://my.clevelandclinic.org/health/diseases/21493-cardiovascular-disease.
4. National Center for Chronic Disease Prevention and Health Promotion (2023, August 01). The Nation’s Risk Factors and CDC’s Response, Available online: https://www.cdc.gov/chronicdisease/resources/publications/factsheets/heart-disease-stroke.htm.
5. Priorities for Patient-Centered Research in Valvular Heart Disease: A Report from the National Heart, Lung, and Blood Institute Working Group;Lindman;J. Am. Heart Assoc.,2020