Identification and Segregation of Genes with Improved Recurrent Neural Network Trained with Optimal Gene Level and Mutation Level Features

Author:

Pukhta Irfan Rashid1,Rout Ranjeet Kumar1

Affiliation:

1. National Institute of Technology

Abstract

Abstract Background With containing chemical bases that encode a protein, genes influence the foundations of life. Mutations are changes throughout a gene that has the potential to affect the function of a protein. Whenever a mutation causes uncontrollable cellular proliferation, cancer arises. Accordingly, the tumor progression and mutations classified as drivers provide a growth advantage, whilst passengers just don't. Methods The goal of this research is to develop an effective classification system for discriminating between driver and passenger mutations from a methodological standpoint. A new gene identification and segregation model is presented in this research article. "(a) pre-processing, (b) treatment of class imbalances, (c) feature extraction, (d) feature selection, and (e) gene classification" are the five primary steps of the proposed model. To improve the quality of the data, the obtained raw data is first pre-processed through "data cleaning and data normalization". This transforms the raw data into something usable as well as effective. In reality, the dataset is skewed, with driver mutation labels appearing in far fewer instances than passenger mutation labels. To tackle the class imbalance problem, the pre-processed data is handled using enhanced K-Means + SMOTE. The most significant characteristics, such as gene-level features and mutation level features, are then retrieved from the balanced dataset. To decrease the computational burden in terms of time, the most optimum features are picked from the retrieved features using Forensic Interpretation Customized Hunger Food Search Optimization (FIHFSO). The traditional Hunger Games Search (HGS) and Forensic-Based Investigation Optimization (FBIO) are conceptually combined in this FIHFSO. The deep learning classifier that performs the segregation process is trained using the specified optimum features (using FIHFSO). A new improved Recurrent Neural Network (I-RNN) is introduced in this study effort for making the final judgment regarding the genes (i.e., classification of driver and passenger genes). Finally, the projected mode is validated to demonstrate its dominance in terms of categorization. Results The I-RNN model has been compared over the existing classifiers like CNN, LSTM, DBN, Bi-GRU, SVM, DRIVE (Dragomir et al., 2021) and EARN (Mirsadeghi et al., 2021), respectively. I-RNN model has recorded the highest accuracy as 95.5%, which is better than the existing models. The major reason behind this performance enhancement in due to the MSE loss function introduced within it. In addition, I-RNN model has recorded the minimal FPR as well as FNR. Conclusion The projected model is said to be highly significant for gene classification owing to its comparative high accuracy. The quantitative identification and segregation of passenger and driver genes in cancer datasets will contribute to precision medicine in oncology.

Publisher

Research Square Platform LLC

Reference43 articles.

1. Ming-Jun Shi, Xiang-Yu Meng, Jacqueline Fontugne, Chun-Long Chen, François Radvanyi & Isabelle Bernard-Pierrot, "Identification of new driver and passenger mutations within APOBEC-induced hotspot mutations in bladder cancer", Genome Medicine, 2020

2. Yves Moreau, "Current cancer driver variant predictors learn to recognize driver genes instead of functional variants";Daniele Raimondi Antoine;BMC Biology,2021

3. J. Xing, Y. Fang, W. Zhang, H. Zhang, D. Tang & D. Wang, "Bacterial driver–passenger model in biofilms: a new mechanism in the development of colorectal cancer", Clinical and Translational Oncology, 2022

4. Seyed Mohammad Razavi, Farzaneh Rami, Seyede Houri Razavi & Changiz Eslahchi, "TOPDRIVER: the novel identifier of cancer driver genes in Gastric cancer and Melanoma", Applied Network Science, 2019

5. Leila Mirsadeghi, Reza Haji Hosseini, Ali Mohammad Banaei-Moghaddam & Kaveh Kavousi, "EARN: an ensemble machine learning algorithm to predict driver genes in metastatic breast cancer", BMC Medical Genomics, 2021

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3