ProtienCNN‐BLSTM: An efficient deep neural network with amino acid embedding‐based model of protein sequence classification and biological analysis

Author:

Lilhore Umesh Kumar1ORCID,Simaiya Sarita1ORCID,Dalal Surjeet2ORCID,Faujdar Neetu3,Sharma Yogesh Kumar4ORCID,Rao K. B. V. Brahma4ORCID,Maheswara Rao V. V. R.5ORCID,Tomar Shilpi6ORCID,Ghith Ehab7ORCID,Tlija Mehdi8ORCID

Affiliation:

1. Department of Computer Science and Engineering Galgotias University Greater Noida India

2. Department of Computer Science and Engineering Amity University Haryana Gurugram India

3. Department of Computer Engineering and Applications GLA University Mathura India

4. Department of Computer Science & Engineering Koneru Lakshmaiah Education Foundation Vaddeswaram India

5. Department of Computer Science and Engineering Shri Vishnu Engineering College for Women (A) Bhimavaram India

6. Department Electrical Engineering Samrat Ashok Technological Institute Vidisha India

7. Department of Mechatronics, Faculty of Engineering Ain Shams University Cairo Egypt

8. Department of Industrial Engineering, College of Engineering King Saud University Riyadh Saudi Arabia

Abstract

AbstractProtein sequence classification needs to be performed quickly and accurately to progress bioinformatics advancements and the production of pharmaceutical products. Extensive comparisons between large databases of known proteins and unknown sequences are necessary in traditional protein classification methods, which can be time‐consuming. This labour‐intensive and slow manual matching and classification method depends on functional and biological commonalities. Protein classification is one of the many fields in which deep learning has recently revolutionized. The data on proteins are organized hierarchically and sequentially, and the most advanced algorithms, such as Deep Family‐based Method (DeepFam) and Protein Convolutional Neural Network (ProtCNN), have shown promising results in classifying proteins into relative groups. On the other hand, these methods frequently refuse to acknowledge this fact. We propose a novel hybrid model called ProteinCNN‐BLSTM to overcome these particular challenges. To produce more accurate protein sequence classification, it combines the techniques of amino acid embedding with bidirectional long short‐term memory (BLSTM) and convolutional neural networks (CNNs). The CNN component is the most effective at capturing local features, while the BLSTM component is the most capable of modeling long‐term dependencies across protein sequences. Through the process of amino acid embedding, sequences of proteins are transformed into numeric vectors, which significantly improves the precision of prediction and the representation of features. Using the standard protein samples PDB‐14189 and PDB‐2272, we analyzed the proposed ProteinCNN‐BLSTM model and the existing deep‐learning models. Compared to the existing models, such as CNN, LSTM, GCNs, CNN‐LSTM, RNNs, GCN‐RNN, DeepFam, and ProtCNN, the proposed model performed more accurately and better than the existing models.

Funder

King Saud University

Publisher

Wiley

Reference41 articles.

1. ZhouY TanK ShenX HeZ.A protein structure prediction approach leveraging transformer and CNN integration. arXiv preprint arXiv:2402.19095.2024.

2. ULDNA: integrating unsupervised multi-source language models with LSTM-attention network for high-accuracy protein–DNA binding site prediction

3. An Efficient Deep Learning Approach for DNA-Binding Proteins Classification from Primary Sequences

4. RPI‐GGCN: prediction of RNA–protein interaction based on interpretability gated graph convolution neural network and co‐regularized variational autoencoders;Wang Y;IEEE Trans Neural Netw Learn Syst,2024

5. Prediction of anti‐freezing proteins from their evolutionary profile;Kumar N;bioRxiv,2024

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3