Abstract
This paper examines the use of Machine Learning (ML) techniques, particularly Logistic Regression and Random Forests, to predict the occurrence of strokes. It integrates demographic, clinical, and lifestyle factors. The study uses Python as the primary tool for model development and analysis, focusing on binary classification to categorize individuals as either having had a stroke or not. The dataset includes attributes such as age, gender, hypertension, smoking status, and more, which are used to train and evaluate the models. Through extensive experimentation and evaluation, the paper demonstrates the effectiveness of Logistic Regression and Random Forests in stroke prediction. Logistic Regression provides a straightforward baseline, while Random Forests offer higher predictive accuracy. The findings highlight the importance of ML-based approaches in healthcare risk assessment and showcase Python's versatility in facilitating such analyses.
Publisher
Centre for Evaluation in Education and Science (CEON/CEES)
Reference14 articles.
1. Bonkhoff, A. K., & Grefkes, C. (2022). Precision medicine in stroke: towards personalized outcome predictions using artificial intelligence. Brain, 145(2), 457-475;
2. Couronné, R., Probst, P., & Boulesteix, A. L. (2018). Random forest versus logistic regression: a large-scale benchmark experiment. BMC bioinformatics, 19, 1-14;
3. Fernandez-Lozano, C., Hervella, P., Mato-Abad, V., Rodríguez-Yáñez, M., Suárez-Garaboa, S., López-Dequidt, I., Estany-Gestal, A., Sobrino, T., Campos, F., Castillo, J., Rodríguez-Yáñez, S., & Iglesias-Rey, R. (2021). Random forest-based prediction of stroke outcome. Scientific reports, 11(1), 10071. https://doi.org/10.1038/s41598-021-89434-7;
4. Hajipour, F., Jozani, M. J., & Moussavi, Z. (2020). A comparison of regularized Logistic Regression and Random Forest Machine Learning models for daytime diagnosis of obstructive sleep apnea. Medical & Biological Engineering & Computing, 58(10), 2517-2529. doi:10.1007/s11517-020-02206-9;
5. Jing, Y. (2022). Machine Learning Performance Analysis to Predict Stroke Based on Imbalanced Medical Dataset. In CAIBDA 2022; 2nd International Conference on Artificial Intelligence, Big Data and Algorithms (pp. 1-7). Nanjing, China;