Author:
Mohammed Rashid Mohanad,Yaseen Omar Mahmood,Riyadh Saeed Rana,Alasaady Maher Talal
Abstract
Diabetes is recognized as one of the most detrimental diseases worldwide, characterized by elevated levels of blood glucose stemming from either insulin deficiency or decreased insulin efficacy. Early diagnosis of diabetes enables patients to initiate treatment promptly, thereby minimizing or eliminating the risk of severe complications. Although years of research in computational diagnosis have demonstrated that machine learning offers a robust methodology for predicting diabetes, existing models leave considerable room for improvement in terms of accuracy. This paper proposes an improved ensemble machine learning approach using multiple classifiers for diabetes diagnosis based on the Pima Indians Diabetes Dataset (PIDD). The proposed ensemble voting classifier amalgamates five machine learning algorithms: Decision Tree (DT), Logistic Regression (LR), K-Nearest Neighbor (KNN), Random Forests (RF), and XGBoost. We obtained the individual model accuracies and used the ensemble method to improve accuracy. The proposed approach uses a pre-processing stage of standardization and imputation and applies the Local Outlier Factor (LOF) to remove data anomalies. The model was evaluated using sensitivity, specificity, and accuracy criteria. With a reported accuracy of 81%, the proposed approach shows promise compared to prior classification techniques.
Publisher
Universiti Putra Malaysia