Computational analysis and prediction of lysine malonylation sites by exploiting informative features in an integrative machine-learning framework

Author:

Zhang Yanju1,Xie Ruopeng1,Wang Jiawei2,Leier André34,Marquez-Lago Tatiana T34,Akutsu Tatsuya5,Webb Geoffrey I6,Chou Kuo-Chen78,Song Jiangning6910

Affiliation:

1. School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China

2. Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, VIC 3800, Australia

3. Department of Genetics, School of Medicine, University of Alabama at Birmingham, AL, USA

4. Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA

5. Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan

6. Monash Centre for Data Science, Faculty of Information Technology, Monash University, VIC 3800, Australia

7. Gordon Life Science Institute, Boston, MA 02478, USA

8. Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China

9. Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, VIC 3800, Australia

10. ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, VIC 3800, Australia

Abstract

AbstractAs a newly discovered post-translational modification (PTM), lysine malonylation (Kmal) regulates a myriad of cellular processes from prokaryotes to eukaryotes and has important implications in human diseases. Despite its functional significance, computational methods to accurately identify malonylation sites are still lacking and urgently needed. In particular, there is currently no comprehensive analysis and assessment of different features and machine learning (ML) methods that are required for constructing the necessary prediction models. Here, we review, analyze and compare 11 different feature encoding methods, with the goal of extracting key patterns and characteristics from residue sequences of Kmal sites. We identify optimized feature sets, with which four commonly used ML methods (random forest, support vector machines, K-nearest neighbor and logistic regression) and one recently proposed [Light Gradient Boosting Machine (LightGBM)] are trained on data from three species, namely, Escherichia coli, Mus musculus and Homo sapiens, and compared using randomized 10-fold cross-validation tests. We show that integration of the single method-based models through ensemble learning further improves the prediction performance and model robustness on the independent test. When compared to the existing state-of-the-art predictor, MaloPred, the optimal ensemble models were more accurate for all three species (AUC: 0.930, 0.923 and 0.944 for E. coli, M. musculus and H. sapiens, respectively). Using the ensemble models, we developed an accessible online predictor, kmal-sp, available at http://kmalsp.erc.monash.edu/. We hope that this comprehensive survey and the proposed strategy for building more accurate models can serve as a useful guide for inspiring future developments of computational methods for PTM site prediction, expedite the discovery of new malonylation and other PTM types and facilitate hypothesis-driven experimental validation of novel malonylated substrates and malonylation sites.

Funder

Natural Science Foundation of Guangxi

Innovation Project of Guilin University of Electronic Technology Graduate Education

Australian Research Council

National Institute of Allergy and Infectious Diseases of the National Institutes of Health

Monash University

Discovery Outstanding Research Award

Informatics Institute of the School of Medicine at University of Alabama at Birmingham

Publisher

Oxford University Press (OUP)

Subject

Molecular Biology,Information Systems

Reference76 articles.

1. Post-translational modifications regulate the ticking of the circadian clock;Gallego;Nat Rev Mol Cell Biol,2007

2. Post-translational modifications regulate microtubule function;Westermann;Nat Rev Mol Cell Biol,2003

3. Features and regulation of non-enzymatic post-translational modifications;Harmel;Nat Chem Biol,2018

4. The regulation of protein phosphorylation;Johnson;Biochem Soc Trans,2009

5. Epsilon-N-Methyl-lysine in bacterial flagellar protein;Ambler;Nature,1959

Cited by 81 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3