Abstract
AbstractIn the past, several methods have been developed for predicting antibacterial and antimicrobial peptides, but only limited attempts have been made to predict their minimum inhibitory concentration (MIC) values. In this study, we trained our models on 3,143 peptides and validated them on 786 peptides whose MIC values have been determined experimentally againstEscherichia coli(E. coli). The correlational analysis reveals that the Composition Enhanced Transition and Distribution (CeTD) attributes strongly correlate with MIC values. We initially employed the similarity search strategy utilizing BLAST to estimate MIC values of peptides but found it inadequate for prediction. Next, we developed machine learning techniques-based regression models using a wide range of features, including peptide composition, binary profile, and embeddings of large language models. We implemented feature selection techniques like minimum Redundancy Maximum Relevance (mRMR) to select the best relevant features for developing prediction models. Our Random forest-based regressor, based on selected features, achieved a correlation coefficient (R) of 0.78, R-squared (R²) of 0.59, and a root mean squared error (RMSE) of 0.53 on the validation dataset. Our best model outperforms the existing methods when benchmarked on an independent dataset of 498 inhibitory peptides ofE. coli. One of the major features of the web-based platform EIPpred developed in this study is that it allows users to identify or design peptides that can inhibitE. coliwith the desired MIC value (https://webs.iiitd.edu.in/raghava/eippred).HighlightsPrediction of MIC value of peptides againstE.coli.An independent dataset was generated for comparison.Feature selection using the mRMR method.A regressor method for designing novel inhibitory peptides.A web server and standalone package for predicting the inhibitory activity of peptides.
Publisher
Cold Spring Harbor Laboratory