AutoPeptideML: Automated Machine Learning for Building Trustworthy Peptide Bioactivity Predictors-Reference-Cited by-同舟云学术

AutoPeptideML: Automated Machine Learning for Building Trustworthy Peptide Bioactivity Predictors

Published:2023-11-15 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Fernandez-Diaz Raul^ORCID,Cossio-Pérez Rodrigo^ORCID,Agoni Clement^ORCID,Lam Hoang Thanh,Lopez Vanessa,Shields Denis C.^ORCID

Abstract

AbstractAutomated machine learning (AutoML) solutions can bridge the gap between new computational advances and their real-world applications by enabling experimental scientists to build trustworthy models. We consider the design of such an AutoML tool for developing peptide bioactivity predictors. We analyse different design choices concerning data acquisition and negative class definition, homology partitioning for the construction of independent evaluation sets, the use of protein language models as a general sequence representation method, and model selection and hyperparameter optimisation. We have found that the definition of the negative class has a significant impact in the perceived performance of the models with differences up to 40%; the use of homology partitioning leads to more strict evaluation with up to 50% drops in perceived performance; the use of protein language models achieves state-of-the-art performance across different tasks; and the introduction of hyperparameter optimisation enables simpler machine learning models to perform similarly to more complex architectures. Finally, we integrate the conclusions drawn from this study into AutoPeptideML, an end-to-end, user-friendly application that enables experimental researchers to build trustworthy models, facilitating compliance with community guidelines. The source code, documentation, and data are available athttps://github.com/IBM/AutoPeptideMLand a dedicated web-server is available athttp://peptide.ucd.ie/AutoPeptideML.

Publisher

Cold Spring Harbor Laboratory

Reference59 articles.

1. Anticp 2.0: an updated model for predicting anticancer peptides;Briefings in bioinformatics,2021

2. T. Akiba , S. Sano , T. Yanase , T. Ohta , and M. Koyama . Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 2623–2631, 2019.

3. Prediction of therapeutic peptides using machine learning: computational models, datasets, and feature encodings;IEEE Access,2020

4. Prediction of neuropeptides from sequence information using ensemble classifier and hybrid features;Journal of proteome research,2020

5. Improved prediction and characterization of anticancer activities of peptides using a novel flexible scoring card method;Scientific reports,2021