RBP-TSTL is a two-stage transfer learning framework for genome-scale prediction of RNA-binding proteins

Author:

Peng Xinxin12,Wang Xiaoyu12,Guo Yuming3,Ge Zongyuan4,Li Fuyi156,Gao Xin78,Song Jiangning12ORCID

Affiliation:

1. Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University , Melbourne, Victoria 3800 , Australia

2. Monash Data Futures Institute, Monash University , Melbourne, Victoria 3800 , Australia

3. Department of Epidemiology and Preventive Medicine, School of Public Health and Preventive Medicine, Monash University , Melbourne, Victoria 3004 , Australia

4. Monash e-Research Centre and Faculty of Engineering, Monash University , Melbourne, VIC 3800 , Australia

5. Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne , 792 Elizabeth Street, Melbourne, Victoria 3000 , Australia

6. College of Information Engineering, Northwest A&F University , Yangling, 712100 , China

7. Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST) , Thuwal 23955-6900 , Kingdom of Saudi Arabia

8. KAUST Computational Bioscience Research Center, King Abdullah University of Science and Technology

Abstract

AbstractRNA binding proteins (RBPs) are critical for the post-transcriptional control of RNAs and play vital roles in a myriad of biological processes, such as RNA localization and gene regulation. Therefore, computational methods that are capable of accurately identifying RBPs are highly desirable and have important implications for biomedical and biotechnological applications. Here, we propose a two-stage deep transfer learning-based framework, termed RBP-TSTL, for accurate prediction of RBPs. In the first stage, the knowledge from the self-supervised pre-trained model was extracted as feature embeddings and used to represent the protein sequences, while in the second stage, a customized deep learning model was initialized based on an annotated pre-training RBPs dataset before being fine-tuned on each corresponding target species dataset. This two-stage transfer learning framework can enable the RBP-TSTL model to be effectively trained to learn and improve the prediction performance. Extensive performance benchmarking of the RBP-TSTL models trained using the features generated by the self-supervised pre-trained model and other models trained using hand-crafting encoding features demonstrated the effectiveness of the proposed two-stage knowledge transfer strategy based on the self-supervised pre-trained models. Using the best-performing RBP-TSTL models, we further conducted genome-scale RBP predictions for Homo sapiens, Arabidopsis thaliana, Escherichia coli, and Salmonella and established a computational compendium containing all the predicted putative RBPs candidates. We anticipate that the proposed RBP-TSTL approach will be explored as a useful tool for the characterization of RNA-binding proteins and exploration of their sequence–structure–function relationships.

Funder

National Health and Medical Research Council of Australia

Australian Research Council

National Institute of Allergy and Infectious Diseases

National Institutes of Health

Major Inter-Disciplinary Research

Publisher

Oxford University Press (OUP)

Subject

Molecular Biology,Information Systems

Cited by 18 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3