Affiliation:
1. School of Electronic and Communication Engineering, Shenzhen Polytechnic
2. Peking University
3. Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China
4. School of Management, Shenzhen Polytechnic
Abstract
Abstract
The interaction between proteins and nucleic acid plays an important role in many processes, such as transcription, translation and DNA repair. The mechanisms of related biological events can be understood by exploring the function of proteins in these interactions. The number of known protein sequences has increased rapidly in recent years, but the databases for describing the structure and function of protein have unfortunately grown quite slowly. Thus, improving such databases is meaningful for predicting protein–nucleic acid interactions. Furthermore, the mechanism of related biological events, such as viral infection or designing novel drug targets, can be further understood by understanding the function of proteins in these interactions. The information for each sequence, including its function and interaction sites, were collected and identified, and a database called PNIDB was built. The proteins in PNIDB were grouped into 27 classes, such as transcription, immune system, and structural protein, etc. The function of each protein was then predicted using a machine learning method. Using our method, the predictor was trained on labeled sequences, and then the function of a protein was predicted based on the trained classifier. The prediction accuracy achieved a score of 77.43% by 10-fold cross validation.
Funder
Natural Science Foundation of China
Natural Science Foundation of Guangdong Province
Science and Technology Innovation Commission of Shenzhen
Publisher
Oxford University Press (OUP)
Subject
Molecular Biology,Information Systems
Reference31 articles.
1. A review of DNA-binding proteins prediction methods;Qu;Curr Bioinformatics,2018
2. iRBP-motif-PSSM: identification of RNA-binding proteins based on collaborative learning;Gao;IEEE Access,2019
3. BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches;Bin;Nucleic Acids Res
4. DeepDRBP-2L: a new genome annotation predictor for identifying DNA binding proteins and RNA binding proteins using convolutional neural network and long short-term memory;Zhang;IEEE/ACM Trans Comput Biol Bioinform,2019
5. The protein-DNA Interface database;Norambuena, T.a.M., F;BMC Bioinformatics,2010
Cited by
110 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献