Author:
Yan Chao,Xing Yiluan,Liu Sensen,Gao Erdi,Wang Jinyin
Abstract
AbstractCardiovascular diseases (CVDs) pose a significant threat to global public health, affecting individuals across various age groups. Factors such as cholesterol levels, smoking, alcohol consumption, and physical inactivity contribute to their onset and progression. Enhancing our understanding of CVD etiology and informing targeted interventions for disease prevention and management remains a critical challenge. In this study, we address the task of predicting the likelihood of individuals developing CVDs using machine learning techniques. Specifically, we explore three approaches: the k-nearest neighbors (KNN) algorithm, logistic regression, and the random forest algorithm. Leveraging a comprehensive dataset sourced from Kaggle, encompassing 11 relevant factors, we conduct a series of experiments to identify the most influential predictors of CVDs. Our analysis aims not only to forecast disease occurrence but also to elucidate the primary determinants contributing to its manifestation. Through comparative analysis of the three methodologies, we demonstrate that the random forest algorithm exhibits superior performance in terms of predictive accuracy. This research represents a significant step towards leveraging machine learning techniques to enhance our understanding of CVD dynamics and inform targeted interventions for disease prevention and management.
Publisher
Cold Spring Harbor Laboratory