Abstract
Accurate identification of protein function is critical to elucidate life mechanisms and design new drugs. We proposed a novel deep-learning method, ATGO, to predict Gene Ontology (GO) attributes of proteins through a triplet neural-network architecture embedded with pre-trained language models from protein sequences. The method was systematically tested on 1068 non-redundant benchmarking proteins and 3328 targets from the third Critical Assessment of Protein Function Annotation (CAFA) challenge. Experimental results showed that ATGO achieved a significant increase of the GO prediction accuracy compared to the state-of-the-art approaches in all aspects of molecular function, biological process, and cellular component. Detailed data analyses showed that the major advantage of ATGO lies in the utilization of pre-trained transformer language models which can extract discriminative functional pattern from the feature embeddings. Meanwhile, the proposed triplet network helps enhance the association of functional similarity with feature similarity in the sequence embedding space. In addition, it was found that the combination of the network scores with the complementary homology-based inferences could further improve the accuracy of the predicted models. These results demonstrated a new avenue for high-accuracy deep-learning function prediction that is applicable to large-scale protein function annotations from sequence alone.
Funder
China Scholarship Council
National Natural Science Foundation of China
Natural Science Foundation of Jiangsu
Foundation of National Defense Key Laboratory of Science and Technology
National Institute of General Medical Sciences
National Institute of Allergy and Infectious Diseases
National Science Foundation
Publisher
Public Library of Science (PLoS)
Subject
Computational Theory and Mathematics,Cellular and Molecular Neuroscience,Genetics,Molecular Biology,Ecology,Modeling and Simulation,Ecology, Evolution, Behavior and Systematics
Reference61 articles.
1. Protein function in the post-genomic era;D Eisenberg;Nature,2000
2. Gene ontology: tool for the unification of biology.;M Ashburner;Nature genetics.,2000
3. Integrating multi-network topology for gene function prediction using deep neural networks;J Peng;Briefings in bioinformatics,2020
4. UniProt: a hub for protein information;U. Consortium;Nucleic acids research,2015
5. GeneMANIA update 2018;M Franz;Nucleic acids research,2018
Cited by
24 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献