Structure-Based Protein Function Prediction using Graph Convolutional Networks-Reference-Cited by-同舟云学术

Structure-Based Protein Function Prediction using Graph Convolutional Networks

Published:2019-10-04 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Gligorijevic Vladimir^ORCID,Renfrew P. Douglas,Kosciolek Tomasz^ORCID,Leman Julia Koehler^ORCID,Berenberg Daniel,Vatanen Tommi^ORCID,Chandler Chris,Taylor Bryn C.^ORCID,Fisk Ian M.,Vlamakis Hera,Xavier Ramnik J.^ORCID,Knight Rob^ORCID,Cho Kyunghyun^ORCID,Bonneau Richard^ORCID

Abstract

The large number of available sequences and the diversity of protein functions challenge current experimental and computational approaches to determining and predicting protein function. We present a deep learning Graph Convolutional Network (GCN) for predicting protein functions and concurrently identifying functionally important residues. This model is initially trained using experimentally determined structures from the Protein Data Bank (PDB) but has significant de-noising capability, with only a minor drop in performance observed when structure predictions are used. We take advantage of this denoising property to train the model on > 200,000 protein structures, including many homology-predicted structures, greatly expanding the reach and applications of the method. Our model learns general structure-function relationships by robustly predicting functions of proteins with ≤ 40% sequence identity to the training set. We show that our GCN architecture predicts functions more accurately than Convolutional Neural Networks trained on sequence data alone and previous competing methods. Using class activation mapping, we automatically identify structural regions at the residue-level that lead to each function prediction for every confidently predicted protein, advancing site-specific function prediction. We use our method to annotate PDB and SWISS-MODEL proteins, making several new confident function predictions spanning both fold and function classifications.

Publisher

Cold Spring Harbor Laboratory

Reference84 articles.

1. Goodsell, D. S. The machinery of life (Springer Science & Business Media, 2009).

2. InterPro in 2019: improving coverage, classification and access to protein sequence annotations;Nucleic Acids Research,2018

3. DISOPRED3: precise disordered region predictions with annotated protein-binding activity

4. CATH: an expanded resource to predict protein function through structure and sequence;Nucleic Acids Research,2016

5. How representative are the known structures of the proteins in a complete genome? A comprehensive structural census

Cited by 24 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. CryptoCEN: A Co-Expression Network forCryptococcus neoformansreveals novel proteins involved in DNA damage repair;2023-08-18

2. Mixed structure- and sequence-based approach for protein graph neural networks with application to antibody developability prediction;2023-06-28

3. Exploring the utility of regulatory network-based machine learning for gene expression prediction in maize;2023-05-14

4. Machine Learning for Cyber-Physical Systems;Digital Transformation;2023

5. Neural representations of cryo-EM maps and a graph-based interpretation;BMC Bioinformatics;2022-09-28