SOFB is a comprehensive ensemble deep learning approach for elucidating and characterizing protein-nucleic-acid-binding residues

Author:

Zhang Bin,Hou Zilong,Yang Yuning,Wong Ka-chunORCID,Zhu HaoranORCID,Li XiangtaoORCID

Abstract

AbstractProteins and nucleic-acids are essential components of living organisms that interact in critical cellular processes. Accurate prediction of nucleic acid-binding residues in proteins can contribute to a better understanding of protein function. However, the discrepancy between protein sequence information and obtained structural and functional data renders most current computational models ineffective. Therefore, it is vital to design computational models based on protein sequence information to identify nucleic acid binding sites in proteins. Here, we implement an ensemble deep learning model-based nucleic-acid-binding residues on proteins identification method, called SOFB, which characterizes protein sequences by learning the semantics of biological dynamics contexts, and then develop an ensemble deep learning-based sequence network to learn feature representation and classification by explicitly modeling dynamic semantic information. Among them, the language learning model, which is constructed from natural language to biological language, captures the underlying relationships of protein sequences, and the ensemble deep learning-based sequence network consisting of different convolutional layers together with Bi-LSTM refines various features for optimal performance. Meanwhile, to address the imbalanced issue, we adopt ensemble learning to train multiple models and then incorporate them. Our experimental results on several DNA/RNA nucleic-acid-binding residue datasets demonstrate that our proposed model outperforms other state-of-the-art methods. In addition, we conduct an interpretability analysis of the identified nucleic acid binding residue sequences based on the attention weights of the language learning model, revealing novel insights into the dynamic semantic information that supports the identified nucleic acid binding residues. SOFB is available at https://github.com/Encryptional/SOFB and https://figshare.com/articles/online_resource/SOFB_figshare_rar/25499452.

Funder

National Natural Science Foundation of China

Publisher

Springer Science and Business Media LLC

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3