Accurately identifying nucleic-acid-binding sites through geometric graph learning on language model predicted structures

Author:

Song Yidong,Yuan QianmuORCID,Zhao Huiying,Yang YuedongORCID

Abstract

AbstractThe interactions between nucleic acids and proteins are important in diverse biological processes. The high-quality prediction of nucleic-acid-binding sites continues to pose a significant challenge. Presently, the predictive efficacy of sequence-based methods is constrained by their exclusive consideration of sequence context information, whereas structure-based methods are unsuitable for proteins lacKing Known tertiary structures. Though protein structures predicted by AlphaFold2 could be used, the extensive computing requirement of AlphaFold2 hinders its use for genome-wide applications. Based on the recent breaKthrough of ESMFold for fast prediction of protein structures, we have developed GLMSite, which accurately identifies DNA and RNA-binding sites using geometric graph learning on ESMFold predicted structures. Here, the predicted protein structures are employed to construct protein structural graph with residues as nodes and spatially neighboring residue pairs for edges. The node representations are further enhanced through the pre-trained language model ProtTrans. The networK was trained using a geometric vector perceptron, and the geometric embeddings were subsequently fed into a common networK to acquire common binding characteristics. Then two fully connected layers were employed to learn specific binding patterns for DNA and RNA, respectively. Through comprehensive tests on DNA/RNA benchmarK datasets, GLMSite was shown to surpass the latest sequence-based methods and be comparable with structure-based methods. Moreover, the prediction was shown useful for the inference of nucleic-acid-binding proteins, demonstrating its potential for protein function discovery. The datasets, codes, together with trained models are available athttps://github.com/biomed-AI/nucleic-acid-binding.

Publisher

Cold Spring Harbor Laboratory

Reference40 articles.

1. PROTEIN-NUCLEIC ACID INTERACTIONS IN TRANSCRIPTION: A Molecular Analysis

2. CATH – a hierarchic classification of protein domain structures

3. Quantitative parameters for amino acid-base interaction: Implications for prediction of protein-DNA binding sites

4. Predicting protein-DNA binding residues by weightedly combining sequence-based features and boosting multiple SVMs;IEEE/ACM transactions on computational biology and bioinformatics,2016

5. DRNApred, fast sequence-based method that accurately predicts and discriminates DNA-and RNA-binding residues;Nucleic acids research,2017

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3