Abstract
AbstractPolypeptides with multiple enzyme domains, such as type I polyketide synthases, produce chemically complex compounds that are difficult to produce via conventional chemical synthesis and are often pharmaceutically or otherwise commercially valuable. Engineering polyketide synthases, via domain swapping and/or site directed mutagenesis, in order to generate novel polyketides, has tended to produce either low yields of product or no product at all. The success of such experiments may be limited by our inability to predict the key functional residues and boundaries of protein domains. Computational tools to identify the boundaries and the residues determining the substrate specificity of domains could reduce the trial and error involved in engineering multi-domain proteins. In this study we use statistical coupling analysis to identify networks of co-evolving residues in type I polyketide synthases, thereby predicting domain boundaries. We extend the method to predicting key residues for enzyme substrate specificity. We introduce bootstrapping calculations to test the relationship between sequence length and the number of sequences needed for a robust analysis. Our results show no simple predictor of the number of sequences needed for an analysis, which can be as few as a hundred and as many as a few thousand. We find that polyketide synthases contain multiple networks of co-substituting residues: some are intradomain but most multiple domains. Some networks of coupled residues correlate with specific functions such as the substrate specificity of the acyl transferase domain, the stereo chemistry of the ketoreductase domain, or domain boundaries that are consistent with experimental data. Our extension of the method provides a ranking of the likely importance of these residues to enzyme substrate specificity, allowing us to propose residues for further mutagenesis work. We conclude that analysis of co-evolving networks of residues is likely to be an important tool for re-engineering multi-domain proteins.Author summaryMany important compounds such as antibiotics or food flavourings are produced naturally by molecular factories within plant, fungal and bacterial cells. These molecular factories typically comprise a complex of multiple interacting enzymes, each enzyme being a stage in a molecular production line. Often the enzymes are connected together as subsections of the same amino acid chain, i.e. protein, with the amino acid chain folding into the separate functional enzymatic domains that comprise the production line. Polyketide synthases are such multi-domain proteins, and their products often have antibacterial, antifungal and antitumoric effects. Engineering polyketide synthases thus has the potential to produce novel drug candidates. We applied and developed statistical approaches to detect where in an amino acid sequence the boundaries are between different domains, potentially allowing these regions to be swapped around for the synthesis of novel compounds. We used the same approaches to identify parts of the amino acid chain important for the function of different types of domain, pointing to how they might be modified to make novel compounds. These analyses agree with published experimental data and allow us to make novel predictions, which we expect to help experimentalists produce novel compounds of commercial and pharmaceutical interest.
Publisher
Cold Spring Harbor Laboratory