Predicting Chronic Kidney Disease Using Hybrid Machine Learning Based on Apache Spark-Reference-Cited by-同舟云学术

Predicting Chronic Kidney Disease Using Hybrid Machine Learning Based on Apache Spark

Published:2022-02-23 Issue: Volume:2022 Page:1-12
ISSN:1687-5273
Container-title:Computational Intelligence and Neuroscience
language:en
Short-container-title:Computational Intelligence and Neuroscience

Author:

Abdel-Fattah Manal A¹^ORCID,Othman Nermin Abdelhakim¹²^ORCID,Goher Nagwa¹³^ORCID

Affiliation:

1. Department of Information Systems, Faculty of Computers and Artificial Intelligence, Helwan University, Cairo, Egypt

2. Faculty of Informatics and Computer Science, British University, Egypt, Cairo, Egypt

3. Department of Information Systems, Faculty of Computer Science, Nahda University in Beni Suef, Beni Suef, Egypt

Abstract

Chronic kidney disease (CKD) has become a widespread disease among people. It is related to various serious risks like cardiovascular disease, heightened risk, and end-stage renal disease, which can be feasibly avoidable by early detection and treatment of people in danger of this disease. The machine learning algorithm is a source of significant assistance for medical scientists to diagnose the disease accurately in its outset stage. Recently, Big Data platforms are integrated with machine learning algorithms to add value to healthcare. Therefore, this paper proposes hybrid machine learning techniques that include feature selection methods and machine learning classification algorithms based on big data platforms (Apache Spark) that were used to detect chronic kidney disease (CKD). The feature selection techniques, namely, Relief-F and chi-squared feature selection method, were applied to select the important features. Six machine learning classification algorithms were used in this research: decision tree (DT), logistic regression (LR), Naive Bayes (NB), Random Forest (RF), support vector machine (SVM), and Gradient-Boosted Trees (GBT Classifier) as ensemble learning algorithms. Four methods of evaluation, namely, accuracy, precision, recall, and F1-measure, were applied to validate the results. For each algorithm, the results of cross-validation and the testing results have been computed based on full features, the features selected by Relief-F, and the features selected by chi-squared feature selection method. The results showed that SVM, DT, and GBT Classifiers with the selected features had achieved the best performance at 100% accuracy. Overall, Relief-F’s selected features are better than full features and the features selected by chi-square.

Publisher

Hindawi Limited

Subject

General Mathematics,General Medicine,General Neuroscience,General Computer Science

Link

http://downloads.hindawi.com/journals/cin/2022/9898831.pdf

Reference36 articles.

1. Health data analytics using scalable logistic regression with stochastic gradient descent

2. Toward Scalable Systems for Big Data Analytics: A Technology Tutorial

3. Apache Hadoop,2021

4. Apache Kafka;A. Kafka,2021

5. Apache Storm;A. Storm,2021

Cited by 27 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Optimization assisted ensemble classification for prediction of chronic kidney disease;Multimedia Tools and Applications;2024-07-16

2. Chronic Kidney Disease Patients in Predicting Renal Function Decline Integration of Machine Learning Techniques;2023 4th International Conference on Intelligent Technologies (CONIT);2024-06-21

3. An effective role-oriented binary Walrus Grey Wolf approach for feature selection in early-stage chronic kidney disease detection;International Urology and Nephrology;2024-05-15

4. Predicting the Progression of Chronic Kidney Disease: A Systematic Review of Artificial Intelligence and Machine Learning Approaches;Cureus;2024-05-12

5. Artificial intelligence and machine learning trends in kidney care;The American Journal of the Medical Sciences;2024-05