Identification of Enzymes-specific Protein Domain Based on DDE, and Convolutional Neural Network-Reference-Cited by-同舟云学术

Identification of Enzymes-specific Protein Domain Based on DDE, and Convolutional Neural Network

Published:2021-11-30 Issue: Volume:12 Page:
ISSN:1664-8021
Container-title:Frontiers in Genetics
language:
Short-container-title:Front. Genet.

Author:

Sikander Rahu,Wang Yuping,Ghulam Ali,Wu Xianjuan

Abstract

Predicting the protein sequence information of enzymes and non-enzymes is an important but a very challenging task. Existing methods use protein geometric structures only or protein sequences alone to predict enzymatic functions. Thus, their prediction results are unsatisfactory. In this paper, we propose a novel approach for predicting the amino acid sequences of enzymes and non-enzymes via Convolutional Neural Network (CNN). In CNN, the roles of enzymes are predicted from multiple sides of biological information, including information on sequences and structures. We propose the use of two-dimensional data via 2DCNN to predict the proteins of enzymes and non-enzymes by using the same fivefold cross-validation function. We also use an independent dataset to test the performance of our model, and the results demonstrate that we are able to solve the overfitting problem. We used the CNN model proposed herein to demonstrate the superiority of our model for classifying an entire set of filters, such as 32, 64, and 128 parameters, with the fivefold validation test set as the independent classification. Via the Dipeptide Deviation from Expected Mean (DDE) matrix, mutation information is extracted from amino acid sequences and structural information with the distance and angle of amino acids is conveyed. The derived feature maps are then encoded in DDE exploitation. The independent datasets are then compared with other two methods, namely, GRU and XGBOOST. All analyses were conducted using 32, 64 and 128 filters on our proposed CNN method. The cross-validation datasets achieved an accuracy score of 0.8762%, whereas the accuracy of independent datasets was 0.7621%. Additional variables were derived on the basis of ROC AUC with fivefold cross-validation was achieved score is 0.95%. The performance of our model and that of other models in terms of sensitivity (0.9028%) and specificity (0.8497%) was compared. The overall accuracy of our model was 0.9133% compared with 0.8310% for the other model.

Funder

National Natural Science Foundation of China-China Academy of General Technology Joint Fund for Basic Research

Publisher

Frontiers Media SA

Subject

Genetics(clinical),Genetics,Molecular Medicine

Reference56 articles.

1. TensorFlow: Learning Functions at Scale;Abadi,2016

2. Gapped BLAST and PSI-BLAST: a New Generation of Protein Database Search Programs;Altschul;Nucleic Acids Res.

3. Gapped BLAST and PSI-BLAST: a New Generation of Protein Database Search Programs;Altschul;Nucleic Acids Res.,1997

4. A Machine Learning Methodology for Enzyme Functional Classification Combining Structural and Protein Sequence Descriptors;Amidi,2016

5. UniProt: the Universal Protein Knowledgebase;Apweiler;Nucleic Acids Res.,2004

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. DeepImmuno-PSSM: Identification of Immunoglobulin based on Deep learning and PSSM-Profiles;VAWKUM Transactions on Computer Sciences;2023-03-17

2. AFP-SPTS: An Accurate Prediction of Antifreeze Proteins Using Sequential and Pseudo-Tri-Slicing Evolutionary Features with an Extremely Randomized Tree;Journal of Chemical Information and Modeling;2023-01-17

3. Prediction of Amyloid Proteins Using Embedded Evolutionary & Ensemble Feature Selection Based Descriptors With eXtreme Gradient Boosting Model;IEEE Access;2023

4. Prediction of the Ibuprofen Loading Capacity of MOFs by Machine Learning;Bioengineering;2022-09-30