Enhanced Diabetic Prediction Using Fuzzy C-Means Preprocessing and Random Forest Ensemble Learning
-
Published:2023-12-02
Issue:4
Volume:11
Page:32-44
-
ISSN:2309-3978
-
Container-title:VFAST Transactions on Software Engineering
-
language:
-
Short-container-title:VFAST trans. softw. eng.
Author:
Bhatti PrihaORCID, Mahboob KhalidORCID, Naeem Syed SaadORCID, Bhatti Iqra HeerORCID, Kamran NoorulainORCID
Abstract
Diabetes claims the lives of thousands each year, and many individuals remain oblivious to their condition until it reaches a critical stage. This study presents a data mining-based approach aimed at enhancing the early detection and prediction of diabetes, utilizing data from the Pima Indian Diabetes dataset. Despite the adaptability of fuzzy C-Means for various data types, the ultimate outcome of the clustering process hinges on the initial placement of cluster centers. Additionally, precision in data clustering is crucial; it can furnish either extensive, well-grouped data for the random forest or limited data, constraining its efficacy. Our principal objective was to enhance the accuracy of fuzzy C-means clustering and the random forest. To boost the model's performance, we incorporated PCA, fuzzy c-means, and the Random Forest approach. Various algorithmic combinations were employed, and the results unequivocally demonstrate that our model surpasses the original outcomes of the Pima Indian Diabetes Dataset in terms of accuracy. The diabetic prediction model achieved a remarkable accuracy of 97.40\% through the utilization of PCA, logistic regression, and K-Means. However, when employing PCA in conjunction with fuzzy C-means and random forests, an even higher accuracy of 98.96\% was attained. Empirical evidence confirms that the implementation of PCA significantly enhanced the accuracy of both the fuzzy C-means clustering approach and the random forest classifier, deviating from previous findings. To improve the model's performance, we used PCA, fuzzy c-means, and the Random Forest approach. Different algorithm combinations were used, and the results clearly show that our model outperforms the original Pima Indian Diabetes Dataset outcomes in terms of accuracy. The diabetic prediction model was improved to 97.40% accuracy using PCA, logistic regression, and K -Means. Using PCA with fuzzy C-means and random forests, however, we achieved an accuracy of 98.96%. Based on empirical evidence, it has been demonstrated that the implementation of PCA improved the accuracy of the fuzzy C-means clustering approach and the random forest classifier. These findings differ from previous findings.
Publisher
VFAST Research Platform
Reference32 articles.
1. bibitem{1} A. Iyer, S. Jeyalatha, and R. Sumbaly, "Diagnosis of diabetes using classification mining techniques," International Journal of Data Mining and Knowledge Management Process (IJDKP), vol. 5, no. 1, 2015. 2. bibitem{2} T. Jhaldiyal and P. K. Mishra, "Analysis and prediction of diabetes mellitus using PCA, REP and SVM," International Journal of Engineering and Technology Research (IJETR), vol. 2, issue 8, ISSN: 2321-0869, 2014. 3. bibitem{3} W. Han, S. Y. Shengqi, H. Zhangqin, J. He, and X. Wang, "Type 2 diabetes mellitus prediction model based on data mining," Informatics in Medicine Unlocked, vol. 10, pp. 100–107, 2018. 4. bibitem{4} G. K. Asha, V. Punya, M. A. Jayaram, and A. S. Manjunath, "Rule-based classification for diabetic patients using cascaded K-means and decision tree C4.5," International Journal of Computer Applications, vol. 45, no. 12, ISSN: 0975 – 8887, 2012. 5. bibitem{5} B. M. Patil, R. C. Joshi, and D. Toshniwal, "Hybrid prediction model for Type-2 diabetic patients," Expert Systems with Applications, vol. 37, pp. 8102–8108, 2010.
|
|