An Improved Ensemble Machine Learning Approach for Diabetes Diagnosis-Reference-Cited by-同舟云学术

An Improved Ensemble Machine Learning Approach for Diabetes Diagnosis

Published:2024-04-04 Issue:3 Volume:32 Page:1335-1350
ISSN:2231-8526
Container-title:Pertanika Journal of Science and Technology
language:en
Short-container-title:JST

Author:

Mohammed Rashid Mohanad,Yaseen Omar Mahmood,Riyadh Saeed Rana,Alasaady Maher Talal

Abstract

Diabetes is recognized as one of the most detrimental diseases worldwide, characterized by elevated levels of blood glucose stemming from either insulin deficiency or decreased insulin efficacy. Early diagnosis of diabetes enables patients to initiate treatment promptly, thereby minimizing or eliminating the risk of severe complications. Although years of research in computational diagnosis have demonstrated that machine learning offers a robust methodology for predicting diabetes, existing models leave considerable room for improvement in terms of accuracy. This paper proposes an improved ensemble machine learning approach using multiple classifiers for diabetes diagnosis based on the Pima Indians Diabetes Dataset (PIDD). The proposed ensemble voting classifier amalgamates five machine learning algorithms: Decision Tree (DT), Logistic Regression (LR), K-Nearest Neighbor (KNN), Random Forests (RF), and XGBoost. We obtained the individual model accuracies and used the ensemble method to improve accuracy. The proposed approach uses a pre-processing stage of standardization and imputation and applies the Local Outlier Factor (LOF) to remove data anomalies. The model was evaluated using sensitivity, specificity, and accuracy criteria. With a reported accuracy of 81%, the proposed approach shows promise compared to prior classification techniques.

Publisher

Universiti Putra Malaysia

Reference42 articles.

1. Agrawal, K., Bhargav, G., & Spandana, E. (2021). Diabetes diagnosis prediction using ensemble approach. In V. Nath & J. K. Mandal (Eds.), Proceedings of the Fourth International Conference on Microelectronics, Computing and Communication Systems: Lecture Notes in Electrical Engineering, vol 673 (pp. 799-813). Springer. https://doi.org/10.1007/978-981-15-5546-6_66

2. Agresti, A. (2015). Foundations of linear and generalized linear models. John Wiley & Sons

3. Akyol, K., & Şen, B. (2018). Diabetes mellitus data classification by cascading of feature selection methods and ensemble learning algorithms. International Journal of Modern Education & Computer Science, 10(6), 10-16. https://doi.org/10.5815/ijmecs.2018.06.02

4. Alasaady, M. T., Aris, T. N. M., Sharef, N. M., & Hamdan, H. (2022). A proposed approach for diabetes diagnosis using neuro-fuzzy technique. Bulletin of Electrical Engineering and Informatics, 11(6), 3590–3597. https://doi.org/10.11591/eei.v11i6.4269

5. Alasaady, M. T., Saeed, M. G., & Faraj, K. H. (2019, February 13-14). Evaluation and comparison framework for data modeling languages. [Paper presentation]. 2nd International Conference on Electrical, Communication, Computer, Power and Control Engineering (ICECCPCE), Mosul, Iraq. https://doi.org/10.1109/ICECCPCE46549.2019.203750