ASPIRER: a new computational approach for identifying non-classical secreted proteins based on deep learning

Author:

Wang Xiaoyu1,Li Fuyi2,Xu Jing1,Rong Jia3,Webb Geoffrey I3,Ge Zongyuan4,Li Jian5,Song Jiangning3ORCID

Affiliation:

1. Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology , Monash University, Melbourne, VIC 3800 , Australia

2. Department of Microbiology and Immunology , The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Melbourne, Victoria, Australia

3. Department of Data Science and AI , Faculty of Information Technology, Monash University, Melbourne, VIC 3800 , Australia

4. Monash e-Research Centre and Faculty of Engineering , Monash University, Melbourne, VIC 3800 , Australia

5. Biomedicine Discovery Institute and Department of Microbiology , Monash University, Melbourne, VIC 3800 , Australia

Abstract

Abstract Protein secretion has a pivotal role in many biological processes and is particularly important for intercellular communication, from the cytoplasm to the host or external environment. Gram-positive bacteria can secrete proteins through multiple secretion pathways. The non-classical secretion pathway has recently received increasing attention among these secretion pathways, but its exact mechanism remains unclear. Non-classical secreted proteins (NCSPs) are a class of secreted proteins lacking signal peptides and motifs. Several NCSP predictors have been proposed to identify NCSPs and most of them employed the whole amino acid sequence of NCSPs to construct the model. However, the sequence length of different proteins varies greatly. In addition, not all regions of the protein are equally important and some local regions are not relevant to the secretion. The functional regions of the protein, particularly in the N- and C-terminal regions, contain important determinants for secretion. In this study, we propose a new hybrid deep learning-based framework, referred to as ASPIRER, which improves the prediction of NCSPs from amino acid sequences. More specifically, it combines a whole sequence-based XGBoost model and an N-terminal sequence-based convolutional neural network model; 5-fold cross-validation and independent tests demonstrate that ASPIRER achieves superior performance than existing state-of-the-art approaches. The source code and curated datasets of ASPIRER are publicly available at https://github.com/yanwu20/ASPIRER/. ASPIRER is anticipated to be a useful tool for improved prediction of novel putative NCSPs from sequences information and prioritization of candidate proteins for follow-up experimental validation.

Funder

Monash University

National Institutes of Health

Australian Research Council

National Health and Medical Research Council

Publisher

Oxford University Press (OUP)

Subject

Molecular Biology,Information Systems

Reference55 articles.

1. The gram stain;Bartholomew;Bacteriol Rev,1952

2. The bacterial cell envelope;Silhavy;Cold Spring Harb Perspect Biol,2010

3. Principle and potential applications of the non-classical protein secretory pathway in bacteria;Kang;Appl Microbiol Biotechnol,2020

4. Bacillus subtilis as cell factory for pharmaceutical proteins: a biotechnological approach to optimize the host organism, Biochimica et Biophysica Acta (BBA)-Molecular;Westers;Cell Res,2004

5. The enzymology of protein translocation across the Escherichia coli plasma membrane;Wickner;Annu Rev Biochem,1991

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3