Affiliation:
1. Department of IT Convergence Engineering, Gachon University, Seongnam-daero 1342, Seongnam-si 13120, Republic of Korea
2. Department of Food Science and Biotechnology, Gachon University, Seongnam-daero 1342, Sujeong-gu, Seongnam-si 13120, Republic of Korea
3. Department of Computer Engineering, Gachon University, Seongnam-daero 1342, Seongnam-si 13120, Republic of Korea
Abstract
Parkinson’s disease is a neurodegenerative disease that is associated with genetic and environmental factors. However, the genes causing this degeneration have not been determined, and no reported cure exists for this disease. Recently, studies have been conducted to classify diseases with RNA-seq data using machine learning, and accurate diagnosis of diseases using machine learning is becoming an important task. In this study, we focus on how various feature selection methods can improve the performance of machine learning for accurate diagnosis of Parkinson’s disease. In addition, we analyzed the performance metrics and computational costs of running the model with and without various feature selection methods. Experiments were conducted using RNA sequencing—a technique that analyzes the transcription profiling of organisms using next-generation sequencing. Genetic algorithms (GA), information gain (IG), and wolf search algorithm (WSA) were employed as feature selection methods. Machine learning algorithms—extreme gradient boosting (XGBoost), deep neural network (DNN), support vector machine (SVM), and decision tree (DT)—were used as classifiers. Further, the model was evaluated using performance indicators, such as accuracy, precision, recall, F1 score, and receiver operating characteristic (ROC) curve. For XGBoost and DNN, feature selection methods based on GA, IG, and WSA improved the performance of machine learning by 10.00% and 38.18%, respectively. For SVM and DT, performance was improved by 0.91% and 7.27%, respectively, with feature selection methods based on IG and WSA. The results demonstrate that various feature selection methods improve the performance of machine learning when classifying Parkinson’s disease using RNA-seq data.
Funder
Ministry of Education of the Republic of Korea
Korean government
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献