BACKGROUND
Chronic heart failure is a serious threat to human health, with high morbidity and mortality rates, imposing a heavy burden on the healthcare system and society. With the abundance of medical data and the rapid development of machine learning technologies, new opportunities are provided for in-depth investigation of the mechanisms of chronic heart failure and the construction of predictive models. The introduction of health ecology research methodology enables a comprehensive dissection of chronic heart failure risk factors from a wider range of environmental, social and individual factors. This not only helps to identify high-risk groups at an early stage, but also provides a scientific basis for the development of precise prevention and intervention strategies.
OBJECTIVE
This study aims to use machine learning (ML) to construct a predictive model of the risk of occurrence of chronic heart failure (CHF) and analyze the risk of CHF from a health ecology perspective.
METHODS
This study is a retrospective cohort study based on the Jackson Heart Study. This study included 2,553 patients who did not have heart failure at baseline and used the occurrence of chronic heart failure as an outcome measure during a 10-year follow-up period. This study used machine learning algorithms to first clean the data, and then used chi-square tests and principal component analysis to select and interpret features. Finally, models were constructed based on the selected features. A total of four models were constructed that are decision tree model, random forest model, XGBoost model and stacked model.
RESULTS
Through feature selection, a total of 20 risk factors were ultimately determined, namely age, alcohol drinking, systolic blood pressure, glycosylated hemoglobin, high sensitivity C-reactive protein, heart rate, insurance type, income, education, the proportion of the population living in poverty in the region, neighborhood problems, favorable food stores (3 mile kernel), sportindex, activeindex, medical institution which usually go, ever awakened by trouble breathing, ever had swelling of feet or ankles, marriage, ratio of mv_peake to ma_peaka, history of cardiovascular diseases. The model with the best performance is XGBOOST, which has an accuracy of 0.889, a sensitivity of 0.919, and an F1 value of 0.859.
CONCLUSIONS
This study proposes an ML-based risk prediction model for the development of chronic heart failure, which uses chi-square and PCA for feature selection and interprets it in the context of health ecology. XGBoost is superior to RF and DT and can accurately and rapidly predict disease onset, provide new ideas for clinical diagnosis and disease progression, and provide effective real-time risk assessment and intervention tools for chronic heart failure patients.