Enhanced Diabetic Prediction Using Fuzzy C-Means Preprocessing and Random Forest Ensemble Learning

Author:

Bhatti PrihaORCID,Mahboob KhalidORCID,Naeem Syed SaadORCID,Bhatti Iqra HeerORCID,Kamran NoorulainORCID

Abstract

Diabetes claims the lives of thousands each year, and many individuals remain oblivious to their condition until it reaches a critical stage. This study presents a data mining-based approach aimed at enhancing the early detection and prediction of diabetes, utilizing data from the Pima Indian Diabetes dataset. Despite the adaptability of fuzzy C-Means for various data types, the ultimate outcome of the clustering process hinges on the initial placement of cluster centers. Additionally, precision in data clustering is crucial; it can furnish either extensive, well-grouped data for the random forest or limited data, constraining its efficacy. Our principal objective was to enhance the accuracy of fuzzy C-means clustering and the random forest. To boost the model's performance, we incorporated PCA, fuzzy c-means, and the Random Forest approach. Various algorithmic combinations were employed, and the results unequivocally demonstrate that our model surpasses the original outcomes of the Pima Indian Diabetes Dataset in terms of accuracy. The diabetic prediction model achieved a remarkable accuracy of 97.40\% through the utilization of PCA, logistic regression, and K-Means. However, when employing PCA in conjunction with fuzzy C-means and random forests, an even higher accuracy of 98.96\% was attained. Empirical evidence confirms that the implementation of PCA significantly enhanced the accuracy of both the fuzzy C-means clustering approach and the random forest classifier, deviating from previous findings. To improve the model's performance, we used PCA, fuzzy c-means, and the Random Forest approach. Different algorithm combinations were used, and the results clearly show that our model outperforms the original Pima Indian Diabetes Dataset outcomes in terms of accuracy. The diabetic prediction model was improved to 97.40% accuracy using PCA, logistic regression, and K -Means. Using PCA with fuzzy C-means and random forests, however, we achieved an accuracy of 98.96%. Based on empirical evidence, it has been demonstrated that the implementation of PCA improved the accuracy of the fuzzy C-means clustering approach and the random forest classifier. These findings differ from previous findings.

Publisher

VFAST Research Platform

Reference32 articles.

1. bibitem{1} A. Iyer, S. Jeyalatha, and R. Sumbaly, "Diagnosis of diabetes using classification mining techniques," International Journal of Data Mining and Knowledge Management Process (IJDKP), vol. 5, no. 1, 2015.

2. bibitem{2} T. Jhaldiyal and P. K. Mishra, "Analysis and prediction of diabetes mellitus using PCA, REP and SVM," International Journal of Engineering and Technology Research (IJETR), vol. 2, issue 8, ISSN: 2321-0869, 2014.

3. bibitem{3} W. Han, S. Y. Shengqi, H. Zhangqin, J. He, and X. Wang, "Type 2 diabetes mellitus prediction model based on data mining," Informatics in Medicine Unlocked, vol. 10, pp. 100–107, 2018.

4. bibitem{4} G. K. Asha, V. Punya, M. A. Jayaram, and A. S. Manjunath, "Rule-based classification for diabetic patients using cascaded K-means and decision tree C4.5," International Journal of Computer Applications, vol. 45, no. 12, ISSN: 0975 – 8887, 2012.

5. bibitem{5} B. M. Patil, R. C. Joshi, and D. Toshniwal, "Hybrid prediction model for Type-2 diabetic patients," Expert Systems with Applications, vol. 37, pp. 8102–8108, 2010.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3