Large-scale comparative assessment of computational predictors for lysine post-translational modification sites

Author:

Chen Zhen1,Liu Xuhan2,Li Fuyi34,Li Chen35,Marquez-Lago Tatiana67,Leier André67,Akutsu Tatsuya8,Webb Geoffrey I9,Xu Dakang1011,Smith Alexander Ian34,Li Lei1ORCID,Chou Kuo-Chen1213,Song Jiangning349

Affiliation:

1. School of Basic Medical Science, Qingdao University, Dengzhou Road, Qingdao, Shandong, China

2. Medicinal Chemistry, Leiden Academic Centre for Drug Research,Einsteinweg, Leiden, The Netherlands

3. Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, VIC, Australia

4. ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC, Australia

5. Institute of Molecular Systems Biology, ETH Zürich,Auguste-Piccard-Hof, Zürich, Switzerland

6. Department of Genetics, School of Medicine, University of Alabama at Birmingham, AL, USA

7. Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA

8. Bioinformatics Center, Institute for Chemical Research,Kyoto University, Uji, Kyoto, Japan

9. Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC, Australia

10. Faculty of Medical Laboratory Science, Ruijin Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China

11. Department of Molecular and Translational Science, Faculty of Medicine, Hudson Institute of Medical Research, Monash University, Melbourne, VIC, Australia

12. Gordon Life Science Institute, Boston, MA, USA

13. Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China

Abstract

Abstract Lysine post-translational modifications (PTMs) play a crucial role in regulating diverse functions and biological processes of proteins. However, because of the large volumes of sequencing data generated from genome-sequencing projects, systematic identification of different types of lysine PTM substrates and PTM sites in the entire proteome remains a major challenge. In recent years, a number of computational methods for lysine PTM identification have been developed. These methods show high diversity in their core algorithms, features extracted and feature selection techniques and evaluation strategies. There is therefore an urgent need to revisit these methods and summarize their methodologies, to improve and further develop computational techniques to identify and characterize lysine PTMs from the large amounts of sequence data. With this goal in mind, we first provide a comprehensive survey on a large collection of 49 state-of-the-art approaches for lysine PTM prediction. We cover a variety of important aspects that are crucial for the development of successful predictors, including operating algorithms, sequence and structural features, feature selection, model performance evaluation and software utility. We further provide our thoughts on potential strategies to improve the model performance. Second, in order to examine the feasibility of using deep learning for lysine PTM prediction, we propose a novel computational framework, termed MUscADEL (Multiple Scalable Accurate Deep Learner for lysine PTMs), using deep, bidirectional, long short-term memory recurrent neural networks for accurate and systematic mapping of eight major types of lysine PTMs in the human and mouse proteomes. Extensive benchmarking tests show that MUscADEL outperforms current methods for lysine PTM characterization, demonstrating the potential and power of deep learning techniques in protein PTM prediction. The web server of MUscADEL, together with all the data sets assembled in this study, is freely available at http://muscadel.erc.monash.edu/. We anticipate this comprehensive review and the application of deep learning will provide practical guide and useful insights into PTM prediction and inspire future bioinformatics studies in the related fields.

Funder

Australian Research Council

National Natural Science Foundation of China

National Health and Medical Research Council

National Institute of Allergy and Infectious Diseases

Monash University

Kyoto University

Institute of the School of Medicine

University of Alabama at Birmingham

Publisher

Oxford University Press (OUP)

Subject

Molecular Biology,Information Systems

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3