Genome-scale annotation of protein binding sites via language model and geometric deep learning-Reference-Cited by-同舟云学术

Genome-scale annotation of protein binding sites via language model and geometric deep learning

Published:2023-11-05 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Yuan Qianmu^ORCID,Tian Chong,Yang Yuedong^ORCID

Abstract

AbstractRevealing protein binding sites with other molecules, such as nucleic acids, peptides, or small ligands, sheds light on disease mechanism elucidation and novel drug design. With the explosive growth of proteins in sequence databases, how to accurately and efficiently identify these binding sites from sequences becomes essential. However, current methods mostly rely on expensive multiple sequence alignments or experimental protein structures, limiting their genome-scale applications. Besides, these methods haven’t fully explored the geometry of the protein structures. Here, we propose GPSite, a multi-task network for simultaneously predicting binding residues of DNA, RNA, peptide, protein, ATP, HEM, and metal ions on proteins. GPSite was trained on informative sequence embeddings and predicted structures from protein language models, while comprehensively extracting residual and relational geometric contexts in an end-to-end manner. Experiments demonstrate that GPSite substantially surpasses state-of-the-art sequence-based and structure-based approaches on various benchmark datasets, even when the structures are not well-predicted. The low computational cost of GPSite enables rapid genome-scale binding residue annotations for over 568,000 sequences, providing opportunities to unveil unexplored associations of binding sites with molecular functions, biological processes, and genetic variants. The GPSite webserver and annotation database can be freely accessed athttps://bio-web1.nscc-gz.cn/app/GPSite.

Publisher

Cold Spring Harbor Laboratory

Reference79 articles.

1. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning

2. A Proteome-Scale Map of the Human Interactome Network

3. Metalloproteomes: A Bioinformatic Approach

4. Predicting protein function from sequence and structure

5. JAK2-binding long noncoding RNA promotes breast cancer brain metastasis;The Journal of clinical investigation,2017

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. AMAPEC: accurate antimicrobial activity prediction for fungal effector proteins;2024-01-04