Optimization of Gene Selection for Cancer Classification in High-Dimensional Data Using an Improved African Vultures Algorithm
-
Published:2024-08-06
Issue:8
Volume:17
Page:342
-
ISSN:1999-4893
-
Container-title:Algorithms
-
language:en
-
Short-container-title:Algorithms
Author:
Gafar Mona G.12ORCID, Abohany Amr A.3ORCID, Elkhouli Ahmed E.4, El-Mageed Amr A. Abd5ORCID
Affiliation:
1. Department of Computer Engineering and Information, College of Engineering in Wadi Alddawasir, Prince Sattam bin Abdulaziz University, Kharj 16278, Saudi Arabia 2. Machine Learning and Information Retrieval Department, Artificial Intelligence, Kafrelsheikh University, Kafrelsheikh 33511, Egypt 3. Faculty of Computers and Information, Kafrelsheikh University, Kafrelsheikh 33511, Egypt 4. Department of Biomedical Engineering, Faculty of Electrical Engineering, Menofia University, Menofia 32951, Egypt 5. Department of Information Systems, Sohag University, Sohag 82511, Egypt
Abstract
This study presents a novel method, termed RBAVO-DE (Relief Binary African Vultures Optimization based on Differential Evolution), aimed at addressing the Gene Selection (GS) challenge in high-dimensional RNA-Seq data, specifically the rnaseqv2 lluminaHiSeq rnaseqv2 un edu Level 3 RSEM genes normalized dataset, which contains over 20,000 genes. RNA Sequencing (RNA-Seq) is a transformative approach that enables the comprehensive quantification and characterization of gene expressions, surpassing the capabilities of micro-array technologies by offering a more detailed view of RNA-Seq gene expression data. Quantitative gene expression analysis can be pivotal in identifying genes that differentiate normal from malignant tissues. However, managing these high-dimensional dense matrix data presents significant challenges. The RBAVO-DE algorithm is designed to meticulously select the most informative genes from a dataset comprising more than 20,000 genes and assess their relevance across twenty-two cancer datasets. To determine the effectiveness of the selected genes, this study employs the Support Vector Machine (SVM) and k-Nearest Neighbor (k-NN) classifiers. Compared to binary versions of widely recognized meta-heuristic algorithms, RBAVO-DE demonstrates superior performance. According to Wilcoxon’s rank-sum test, with a 5% significance level, RBAVO-DE achieves up to 100% classification accuracy and reduces the feature size by up to 98% in most of the twenty-two cancer datasets examined. This advancement underscores the potential of RBAVO-DE to enhance the precision of gene selection for cancer research, thereby facilitating more accurate and efficient identification of key genetic markers.
Funder
Prince Sattam bin Abdulaziz University
Reference61 articles.
1. Estrada-Meza, C., Torres-Copado, A., Loreti González-Melgoza, L., Ruiz-Manriquez, L.M., De Donato, M., Sharma, A., Pathak, S., Banerjee, A., and Paul, S. (2022). Recent insights into the microRNA and long non-coding RNA-mediated regulation of stem cell populations. 3 Biotech, 12. 2. Kakati, T., Bhattacharyya, D.K., Kalita, J.K., and Norden-Krichmar, T.M. (2022). DEGnext: Classification of differentially expressed genes from RNA-seq data using a convolutional neural network with transfer learning. BMC Bioinform., 23. 3. Zhao, S., Fung-Leung, W.P., Bittner, A., Ngo, K., and Liu, X. (2014). Comparison of RNA-Seq and microarray in transcriptome profiling of activated T cells. PLoS ONE, 9. 4. TIGER: A web portal of tumor immunotherapy gene expression resource;Chen;Genom. Proteom. Bioinform.,2023 5. Nunez-Garcia, J., AbuOun, M., Storey, N., Brouwer, M., Delgado-Blas, J., Mo, S.S., Ellaby, N., Veldman, K., Haenni, M., and Châtre, P. (2022). Harmonisation of in-silico next-generation sequencing based methods for diagnostics and surveillance. Sci. Rep., 12.
|
|