Affiliation:
1. TEKİRDAĞ NAMIK KEMAL ÜNİVERSİTESİ
Abstract
Diabetes is a disease that occurs when the body cannot regulate the level of sugar (glucose) in the blood. Early diagnosis of this disease is important in preventing more serious diseases that may arise later. Within the scope of this study, an attempt was made to optimize the diabetes data set for use by training it with different models. At the very beginning of the study, Logistic Regression, KNN, SVM (Support Vector Machine), CART (Classification and Regression Trees), RF (Random Forest), Adaboost, GBM (Gradient Boosting Machines), XGBoost (Extreme Gradient Boosting), LGBM (Light Gradient Boosting). Machine), CatBoost models were used. According to the results of the models, RF, LGBM, XGBoost accuracy, and f1 values were observed as the best models, respectively. As a result, in the Random Forest model, which produced the most successful results, Accuracy: 0.88, F1 Score: 0.84, and ROC AUC: 0.95 values were obtained, respectively.
Publisher
International Scientific and Vocational Studies Journal
Reference21 articles.
1. [1] B. Ö. Başer, M. Yangın, and E. S. Sarıdaş, "Makine öğrenmesi teknikleriyle diyabet hastalığının sınıflandırılması," Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi, vol. 25, no. 1, pp. 112-120, 2021.
2. [2] W. W. H. Organization. " “Diabetes.”." https://www.who.int/news-room/fact-sheets/detail/diabetes (accessed Feb. 12, 2024).
3. [3] H. Zhou et al., "A computer simulation model of diabetes progression, quality of life, and cost," Diabetes care, vol. 28, no. 12, pp. 2856-2863, 2005.
4. [4] U. Köse, "Zeki optimizasyon tabanlı destek vektör makineleri ile diyabet teşhisi," Politeknik Dergisi, vol. 22, no. 3, pp. 557-566, 2019.
5. [5] A. D. Khare. "“Diabetes Dataset.”." https://www.kaggle.com/datasets/akshaydattatraykhare/diabetes-dataset/data (accessed Feb. 1, 2024).