Abstract
AbstractMotivationProtein Design has become increasingly important for medical and biotechnological applications. Because of the complex mechanisms underlying protein formation, the creation of a novel protein requires tedious and time-consuming computational or experimental protocols. At the same time, Machine Learning has enabled to solve complex problems by leveraging the large amounts of available data, more recently with great improvements on the domain of generative modeling. Yet, generative models have mainly been applied to specific sub-problems of protein design.ResultsHere we approach the problem of general purpose Protein Design conditioned on functional labels of the hierarchical Gene Ontology. Since a canonical way to evaluate generative models in this domain is missing, we devise an evaluation scheme of several biologically and statistically inspired metrics. We then develop the conditional generative adversarial network ProteoGAN and show that it outperforms several classic and more recent deep learning baselines for protein sequence generation. We further give insights into the model by analysing hyperparameters and ablation baselines. Lastly, we hypothesize that a functionally conditional model could create proteins with novel functions by combining labels and provide first steps into this direction of research.AvailabilityCode and data is available at https://github.com/timkucera/proteoganContacttim.kucera@bsse.ethz.ch, mt@visum.ch, lpapaxanthos@google.com
Publisher
Cold Spring Harbor Laboratory
Reference65 articles.
1. The rosetta all-atom energy function for macromolecular modeling and design;Journal of chemical theory and computation,2017
2. Unified rational protein engineering with sequence-based deep representation learning;Nature methods,2019
3. Angermueller, C. et al. (2019). Model-based reinforcement learning for biological sequence design. In International Conference on Learning Representations.
4. Wasserstein generative adversarial networks;In Proceedings of the 34th International Conference on Machine Learning-,2017
5. Design by Directed Evolution