Accurate protein function prediction via graph attention networks with predicted structure information-Reference-Cited by-同舟云学术

Accurate protein function prediction via graph attention networks with predicted structure information

Published:2021-12-09 Issue:1 Volume:23 Page:
ISSN:1467-5463
Container-title:Briefings in Bioinformatics
language:en
Short-container-title:

Author:

Lai Boqiao¹^ORCID,Xu Jinbo¹

Affiliation:

1. Toyota Technological Institute at Chicago, Chicago, IL 60637, USA

Abstract

Abstract Experimental protein function annotation does not scale with the fast-growing sequence databases. Only a tiny fraction (<0.1%) of protein sequences has experimentally determined functional annotations. Computational methods may predict protein function very quickly, but their accuracy is not very satisfactory. Based upon recent breakthroughs in protein structure prediction and protein language models, we develop GAT-GO, a graph attention network (GAT) method that may substantially improve protein function prediction by leveraging predicted structure information and protein sequence embedding. Our experimental results show that GAT-GO greatly outperforms the latest sequence- and structure-based deep learning methods. On the PDB-mmseqs testset where the train and test proteins share <15% sequence identity, our GAT-GO yields Fmax (maximum F-score) 0.508, 0.416, 0.501, and area under the precision-recall curve (AUPRC) 0.427, 0.253, 0.411 for the MFO, BPO, CCO ontology domains, respectively, much better than the homology-based method BLAST (Fmax 0.117, 0.121, 0.207 and AUPRC 0.120, 0.120, 0.163) that does not use any structure information. On the PDB-cdhit testset where the training and test proteins are more similar, although using predicted structure information, our GAT-GO obtains Fmax 0.637, 0.501, 0.542 for the MFO, BPO, CCO ontology domains, respectively, and AUPRC 0.662, 0.384, 0.481, significantly exceeding the just-published method DeepFRI that uses experimental structures, which has Fmax 0.542, 0.425, 0.424 and AUPRC only 0.313, 0.159, 0.193.

Funder

National Institute of Health

Publisher

Oxford University Press (OUP)

Subject

Molecular Biology,Information Systems

Link

https://academic.oup.com/bib/article-pdf/23/1/bbab502/42231005/bbab502.pdf

Reference61 articles.

1. UniProt: the universal protein knowledgebase;Consortium, U., Others;Nucleic Acids Res,2018

2. Others: the CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens;Zhou;Genome Biol,2019

3. Others: an expanded evaluation of protein function prediction methods shows an improvement in accuracy;Jiang;Genome Biol,2016

4. Others: a large-scale evaluation of computational protein function prediction;Radivojac;Nat Methods,2013

5. Predicting human protein function with multi-task deep neural networks;Fa;PLoS One,2018

Cited by 35 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Identifying virulence factors using graph transformer autoencoder with ESMFold-predicted structures;Computers in Biology and Medicine;2024-03

2. Seq-InSite: sequence supersedes structure for protein interaction site prediction;Bioinformatics;2024-01-01

3. Protein function prediction using graph neural network with multi-type biological knowledge;2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM);2023-12-05

4. Protein Function Prediction with Primary-Tertiary Hierarchical Learning;2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM);2023-12-05

5. SLPFA: Protein Structure-Label Embedding Attention Network for Protein Function Annotation;2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM);2023-12-05