Diabetes Diagnosis through Machine Learning: Investigating Algorithms and Data Augmentation for Class Imbalanced BRFSS Dataset-Reference-Cited by-同舟云学术

Diabetes Diagnosis through Machine Learning: Investigating Algorithms and Data Augmentation for Class Imbalanced BRFSS Dataset

Published:2023-10-19 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Chowdhury Mohammad Mihrab,Ayon Ragib Shahariar,Hossain Md Sakhawat

Abstract

AbstractDiabetes is a prevalent chronic condition that poses significant challenges to early diagnosis and identifying at-risk individuals. Machine learning plays a crucial role in diabetes detection by leveraging its ability to process large volumes of data and identify complex patterns. However, imbalanced data, where the number of diabetic cases is substantially smaller than non-diabetic cases, complicates the identification of individuals with diabetes using machine learning algorithms. Our study focuses on predicting whether a person is at risk of diabetes, considering the individual’s health and socio-economic conditions while mitigating the challenges posed by imbalanced data. To minimize the impact of imbalance data, we employed several data augmentation techniques such as oversampling (SMOTE-N), undersampling (ENN), and hybrid sampling techniques (SMOTE-Tomek and SMOTE-ENN) on training data before applying machine learning algorithms. Our study sheds light on the significance of carefully utilizing data augmentation techniques, without any data leakage, in enhancing the effectiveness of machine learning algorithms. Moreover, it offers a complete machine learning structure for healthcare practitioners, from data obtaining to ML prediction, enabling them to make data-informed strategies.

Publisher

Cold Spring Harbor Laboratory

Reference75 articles.

1. Connecting obesity, aging and diabetes

2. R. Alejo , J. M. Sotoca , R. M. Valdovinos , and P. Toribio . Edited nearest neighbor rule for improving neural networks classifications. In Advances in Neural Networks-ISNN 2010: 7th International Symposium on Neural Networks, ISNN 2010, Shanghai, China, June 6-9, 2010, Proceedings, Part I 7, pages 303–310. Springer, 2010.

3. An approach for classification of highly imbalanced data using weighting and undersampling;Amino acids,2010

4. D. Asiimwe , G. O. Mauti , and R. Kiconco . Prevalence and risk factors associated with type 2 diabetes in elderly patients aged 45-80 years at kanungu district. Journal of diabetes research, 2020:1–5, 2020.

5. Diagnosis and Classification of Diabetes Mellitus