Affiliation:
1. Bioinformatics Centre, National Institute of Immunology, New Delhi 110067, India
Abstract
Abstract
Motivation
Even though genome mining tools have successfully identified large numbers of non-ribosomal peptide synthetase (NRPS) and polyketide synthase (PKS) biosynthetic gene clusters (BGCs) in bacterial genomes, currently no tool can predict the chemical structure of the secondary metabolites biosynthesized by these BGCs. Lack of algorithms for predicting complex macrocyclization patterns of linear PK/NRP biosynthetic intermediates has been the major bottleneck in deciphering the final bioactive chemical structures of PKs/NRPs by genome mining.
Results
Using a large dataset of known chemical structures of macrocyclized PKs/NRPs, we have developed a machine learning (ML) algorithm for distinguishing the correct macrocyclization pattern of PKs/NRPs from the library of all theoretically possible cyclization patterns. Benchmarking of this ML classifier on completely independent datasets has revealed ROC–AUC and PR–AUC values of 0.82 and 0.81, respectively. This cyclization prediction algorithm has been used to develop SBSPKSv3, a genome mining tool for completely automated prediction of macrocyclized structures of NRPs/PKs. SBSPKSv3 has been extensively benchmarked on a dataset of over 100 BGCs with known PKs/NRPs products.
Availability and implementation
The macrocyclization prediction pipeline and all the datasets used in this study are freely available at http://www.nii.ac.in/sbspks3.html.
Supplementary information
Supplementary data are available at Bioinformatics online.
Funder
National Institute of Immunology
Department of Biotechnology
BTIS project
COE
Publisher
Oxford University Press (OUP)
Subject
Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability
Cited by
10 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献