Using an Optimal Set of Features with a Machine Learning-Based Approach to Predict Effector Proteins forLegionella pneumophila

Author:

Ashari Zhila EsnaORCID,Brayton Kelly A.,Broschat Shira L.

Abstract

AbstractType IV secretion systems exist in a number of bacterial pathogens and are used to secrete effector proteins directly into host cells in order to change their environment making the environment hospitable for the bacteria. In recent years, several machine learning algorithms have been developed to predict effector proteins, potentially facilitating experimental verification. However, inconsistencies exist between their results. Previously we analysed the disparate sets of predictive features used in these algorithms to determine an optimal set of 370 features for effector prediction. This work focuses on the best way to use these optimal features by designing three machine learning classifiers, comparing our results with those of others, and obtaining de novo results. We chose the pathogenLegionella pneumophilastrain Philadelphia-1, a cause of Legionnaires’ disease, because it has many validated effector proteins and others have developed machine learning prediction tools for it. While all of our models give good results indicating that our optimal features are quite robust, Model 1, which uses all 370 features with a support vector machine, has slightly better accuracy. Moreover, Model 1 predicted 760 effector proteins, more than any other study, 315 of which have been validated. Although the results of our three models agree well with those of other researchers, their models only predicted 126 and 311 candidate effectors.

Publisher

Cold Spring Harbor Laboratory

Reference31 articles.

1. T4SP Database 2.0: An Improved Database for Type IV Secretion Systems in Bacterial Genomes with New Online Analysis Tools;Computational and Mathematical Methods in Medicine,2016

2. Bacterial Type IV Secretion Systems: Versatile Virulence Machines;Future Microbiology,2012

3. The Coxiella burnetii Cryptic Plasmid Is Enriched in Genes Encoding Type IV Secretion System Substrates

4. Identification of protein secretion systems in bacterial genomes

5. Burstein D , Zusman T , Degtyar E , Viner R , Segal G , Pupko T. Genome-Scale Identification of Legionella pneumophila Effectors Using a Machine Learning Approach. The International Journal of Biochemistry and Cell Biology. 2009; 5(7). (https://doi.org/10.1371/journal.ppat.1000508)

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3