Abstract
AbstractMotivationFacing data quickly accumulating on protein sequence and structure, this study is addressing the following question: to what extent could current data alone reveal deep insights into the sequence-structure relationship, such that new sequences can be designed accordingly for novel structure folds?ResultsWe have developed novel deep generative models, constructed low-dimensional and generalizable representation of fold space, exploited sequence data with and without paired structures, and developed ultra-fast fold predictor as an oracle providing feedback. The resulting semi-supervised gcWGAN is assessed with the oracle over 100 novel folds not in the training set and found to generate more yields and cover 3.6 times more target folds compared to a competing data-driven method (cVAE). Assessed with structure predictor over representative novel folds (including one not even part of basis folds), gcWGAN designs are found to have comparable or better fold accuracy yet much more sequence diversity and novelty than cVAE. gcWGAN explores uncharted sequence space to design proteins by learning from current sequence-structure data. The ultra fast data-driven model can be a powerful addition to principle-driven design methods through generating seed designs or tailoring sequence space.AvailabilityData and source codes will be available upon request.Contactyshen@tamu.eduSupplementary informationSupplementary data are available atBioinformaticsonline.
Publisher
Cold Spring Harbor Laboratory
Reference54 articles.
1. Alberts, B. et al. (2015). Essential cell biology. Garland Science.
2. Principles that Govern the Folding of Protein Chains
3. The reversible reduction of disulfide bonds in polyalanyl ribonuclease;Journal of Biological Chemistry,1962
4. Arjovsky, M. et al. (2017). Wasserstein generative adversarial networks. In International Conference on Machine Learning, pages 214–223.
5. Protein Structure Prediction and Structural Genomics
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献