Twenty years of bioinformatics research for protease-specific substrate and cleavage site prediction: a comprehensive revisit and benchmarking of existing methods

Author:

Li Fuyi1,Wang Yanan12,Li Chen13,Marquez-Lago Tatiana T4,Leier André4,Rawlings Neil D5,Haffari Gholamreza6,Revote Jerico1,Akutsu Tatsuya7,Chou Kuo-Chen89,Purcell Anthony W1,Pike Robert N1011,Webb Geoffrey I6,Ian Smith A111,Lithgow Trevor12,Daly Roger J1,Whisstock James C111,Song Jiangning1611ORCID

Affiliation:

1. Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia

2. Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200240, China

3. Department of Biology, Institute of Molecular Systems Biology,ETH Zürich, Zürich 8093, Switzerland

4. Department of Genetics and Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA

5. EMBL European Bioinformatics Institute, Wellcome Trust Genome Campus, Wellcome Trust Genome Campus,Hinxton, Cambridgeshire CB10 1SD, UK

6. Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia

7. Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan

8. Gordon Life Science Institute, Boston, MA 02478, USA

9. Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China

10. La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC 3086, Australia

11. ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia

12. Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, Victoria 3800, Australia

Abstract

Abstract The roles of proteolytic cleavage have been intensively investigated and discussed during the past two decades. This irreversible chemical process has been frequently reported to influence a number of crucial biological processes (BPs), such as cell cycle, protein regulation and inflammation. A number of advanced studies have been published aiming at deciphering the mechanisms of proteolytic cleavage. Given its significance and the large number of functionally enriched substrates targeted by specific proteases, many computational approaches have been established for accurate prediction of protease-specific substrates and their cleavage sites. Consequently, there is an urgent need to systematically assess the state-of-the-art computational approaches for protease-specific cleavage site prediction to further advance the existing methodologies and to improve the prediction performance. With this goal in mind, in this article, we carefully evaluated a total of 19 computational methods (including 8 scoring function-based methods and 11 machine learning-based methods) in terms of their underlying algorithm, calculated features, performance evaluation and software usability. Then, extensive independent tests were performed to assess the robustness and scalability of the reviewed methods using our carefully prepared independent test data sets with 3641 cleavage sites (specific to 10 proteases). The comparative experimental results demonstrate that PROSPERous is the most accurate generic method for predicting eight protease-specific cleavage sites, while GPS-CCD and LabCaS outperformed other predictors for calpain-specific cleavage sites. Based on our review, we then outlined some potential ways to improve the prediction performance and ease the computational burden by applying ensemble learning, deep learning, positive unlabeled learning and parallel and distributed computing techniques. We anticipate that our study will serve as a practical and useful guide for interested readers to further advance next-generation bioinformatics tools for protease-specific cleavage site prediction.

Funder

National Health and Medical Research Council of Australia

Australian Research Council

National Institute of Allergy and Infectious Diseases of the National Institutes of Health

Monash University

Collaborative Research Program of Institute for Chemical Research, Kyoto University

NHMRC CJ Martin Early Career Research Fellowship

ARC Discovery Outstanding Research Award

Informatics Institute of the School of Medicine at University of Alabama at Birmingham

Publisher

Oxford University Press (OUP)

Subject

Molecular Biology,Information Systems

Reference116 articles.

1. Proteolytic post-translational modification of proteins: proteomic tools and methodology;Rogers;Mol Cell Proteomics,2013

2. Proteolytic processing in the secretory pathway;Zhou;J Biol Chem,1999

3. Proteolysis and the cell cycle;Clarke;Cell Cycle,2002

4. The effect of proteolysis on the induction of cell death by monomeric alpha-lactalbumin;Bruck;Biochimie,2014

5. Regulated intramembrane proteolysis: signaling pathways and biological functions;Lal;Physiology (Bethesda),2011

Cited by 72 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3