Abstract
ABSTRACTEnzymes that cleave ATP to activate carboxylic acids play essential roles in primary and secondary metabolism in all domains of life. Class I adenylate-forming enzymes share a conserved structural fold but act on a wide range of substrates to catalyze reactions involved in bioluminescence, nonribosomal peptide biosynthesis, fatty acid activation, and β-lactone formation. Despite their metabolic importance, the substrates and catalytic functions of the vast majority of adenylate-forming enzymes are unknown without tools available to accurately predict them. Given the crucial roles of adenylate-forming enzymes in biosynthesis, this also severely limits our ability to predict natural product structures from biosynthetic gene clusters. Here we used machine learning to predict adenylate-forming enzyme function and substrate specificity from protein sequence. We built a web-based predictive tool and used it to comprehensively map the biochemical diversity of adenylate-forming enzymes across >50,000 candidate biosynthetic gene clusters in bacterial, fungal, and plant genomes. Ancestral enzyme reconstruction and sequence similarity networking revealed a ‘hub’ topology suggesting radial divergence of the adenylate-forming superfamily from a core enzyme scaffold most related to contemporary aryl-CoA ligases. Our classifier also predicted β-lactone synthetases in novel biosynthetic gene clusters conserved across >90 different strains ofNocardia. To test our computational predictions, we purified a candidate β-lactone synthetase fromNocardia brasiliensisand reconstituted the biosynthetic pathwayin vitroto link the gene cluster to the β-lactone natural product, nocardiolactone. We anticipate our machine learning approach will aid in functional classification of enzymes and advance natural product discovery.
Publisher
Cold Spring Harbor Laboratory
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献