Affiliation:
1. Department of Computer Science and Software Engineering, University of Hail, Kingdom of Saudi Arabia
2. B.S. Abdur Rahman Crescent Institute of Science and Technology Chennai-48, India
3. SRM University Delhi-NCR, Sonepat, Haryana, India
Abstract
Background:
Diabetes has been rising in recent years and prior research has demonstrated
Machine Learning Techniques (MLTs) to be useful tools for predicting diabetes. This research
has examined the accuracy of six different MLTs for predicting diabetes using lifestyle
data gathered from UCI (University of California). To improve medical outcomes and prevent its
onset, the prediction of diabetes is necessary. This research has proposed a new framework based
on the early detection of diabetes using lifestyle factors. Various MLTs, such as Logistic Regression
(LR), Decision Tree Classification (DTC), Random Forest Classification (RFC), Support
Vector Classification (SVC), and K-Nearest Classification (KNC) have been used for tenfold
cross-validation and the results obtained from different techniques have been verified. Among all
classification techniques, LR has achieved the highest accuracy of 93%, the precision of 92%, the
recall score of 94%, the F1 score of 93%, and the weighted average of 90%, respectively. The
proposed framework is utilized by the healthcare sector to predict diabetes early. It can also be
used with datasets from various sectors that share diabetes-related data.
Method:
In this paper, we have used the proposed framework to predict diabetes mellitus in the
healthcare system, diagnose various ailments, and assess if MLA performs well. The proposed
system has been developed based on the MLT for the classification of DM. An intelligent framework
for Diabetes Mellitus (DM) that has been developed using MLT illustrates the full workflow from
data input to output. The five algorithms, Logistic Regression (LR), Decision Tree Classification
(DTC), Random Forest Classification (RFC), Support Vector Classification (SVC), and K-Nearest
Classification (KNC), have been compared in terms of accuracy, precision, recall, and F1 score.
Results:
Results from the experimental setting using MLTs for DM prediction based on lifestyle
predictors have been obtained. Descriptive statistics of lifestyle characteristics have been displayed
along with their corresponding metrics, such as mean, standard deviation, minimum, maximum,
etc. For instance, the age parameters’ mean, standard, and minimum at 25%, 50%, 75%, and
maximum values were as follows: 520.0, 48.02, 12.151, 16.0, 39.0, 47.5, 57.0, and 90.0 respectively,
as shown in Fig. (10). Feature engineering is crucial to the process of constructing MLT. Insignificant
or incorrect characteristics may have a negative impact on the way a model runs. The training
time is drastically reduced and accuracy is increased with careful feature selection. In machine learning
frameworks, some feature selection strategies include embedding, filter, wrapper, embedded, and
hybrid techniques. An alarming number of people around the world suffer from the chronic and dangerous
disease of diabetes. Using MLT, early DM prediction-based biological variables have been
obtained in this research work. Data on patients’ lifestyles have been thoroughly examined in order
to create a framework. The Canonical-correlation Analysis (CCA) has been used to select the ideal
combination of lifestyle features. Finally, 10-fold cross-validations have been used to apply five alternative
machine learning techniques for the prediction of disease.
Conclusion:
To our knowledge, it is the first time a framework has been proposed that has yielded
prediction results so much better than those from earlier research. The results obtained in this suggested
work have been found accurate and reliable by metrics evaluation.
other:
NA
Publisher
Bentham Science Publishers Ltd.