Comprehensive assessment of machine learning-based methods for predicting antimicrobial peptides

Author:

Xu Jing1,Li Fuyi2ORCID,Leier André3,Xiang Dongxu1,Shen Hsin-Hui4,Marquez Lago Tatiana T5,Li Jian6,Yu Dong-Jun7ORCID,Song Jiangning8ORCID

Affiliation:

1. Department of Biochemistry and Molecular Biology and Biomedicine Discovery Institute, Monash University, Australia

2. Department of Microbiology and Immunology, the Peter Doherty Institute for Infection and Immunity, the University of Melbourne, Australia

3. Department of Genetics, UAB School of Medicine, USA

4. Department of Biochemistry & Molecular Biology and Department of Materials Science & Engineering, Monash University, Australia

5. Departments of Genetics and Microbiology, UAB School of Medicine, USA

6. Monash Biomedicine Discovery Institute and Department of Microbiology, Monash University, Australia

7. School of Computer Science and Engineering, Nanjing University of Science and Technology, China

8. Monash Biomedicine Discovery Institute, Monash University, Australia

Abstract

Abstract Antimicrobial peptides (AMPs) are a unique and diverse group of molecules that play a crucial role in a myriad of biological processes and cellular functions. AMP-related studies have become increasingly popular in recent years due to antimicrobial resistance, which is becoming an emerging global concern. Systematic experimental identification of AMPs faces many difficulties due to the limitations of current methods. Given its significance, more than 30 computational methods have been developed for accurate prediction of AMPs. These approaches show high diversity in their data set size, data quality, core algorithms, feature extraction, feature selection techniques and evaluation strategies. Here, we provide a comprehensive survey on a variety of current approaches for AMP identification and point at the differences between these methods. In addition, we evaluate the predictive performance of the surveyed tools based on an independent test data set containing 1536 AMPs and 1536 non-AMPs. Furthermore, we construct six validation data sets based on six different common AMP databases and compare different computational methods based on these data sets. The results indicate that amPEPpy achieves the best predictive performance and outperforms the other compared methods. As the predictive performances are affected by the different data sets used by different methods, we additionally perform the 5-fold cross-validation test to benchmark different traditional machine learning methods on the same data set. These cross-validation results indicate that random forest, support vector machine and eXtreme Gradient Boosting achieve comparatively better performances than other machine learning methods and are often the algorithms of choice of multiple AMP prediction tools.

Funder

National Health and Medical Research Council of Australia

National Natural Science Foundation of China

Australian Research Council

Institute for Chemical Research, Kyoto University

Publisher

Oxford University Press (OUP)

Subject

Molecular Biology,Information Systems

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3