Author:
Jawaid M. Zaki,Yeo Robin W.,Gautam Aayushma,Gainous T. Blair,Hart Daniel O.,Daley Timothy P.
Abstract
AbstractDesigning novel functional proteins remains a slow and expensive process due to a variety of protein engineering challenges; in particular, the number of protein variants that can be experimentally tested in a given assay pales in comparison to the vastness of the overall sequence space, resulting in low hit rates and expensive wet lab testing cycles. In this paper, we propose a few-shot learning approach to novel protein design that aims to accelerate the expensive wet lab testing cycle and is capable of leveraging a training dataset that is both small and skewed (≈ 105datapoints, < 1% positive hits). Our approach is composed of two parts: a semi-supervised transfer learning approach to generate a discrete fitness landscape for a desired protein function and a novel evolutionary Monte Carlo Markov Chain sampling algorithm to more efficiently explore the fitness landscape. We demonstrate the performance of our approach by experimentally screening predicted high fitness gene activators, resulting in a dramatically improved hit rate compared to existing methods. Our method can be easily adapted to other protein engineering and design problems, particularly where the cost associated with obtaining labeled data is significantly high. We have provided open source code for our method athttps://github.com/SuperSecretBioTech/evolutionary_monte_carlo_search.
Publisher
Cold Spring Harbor Laboratory
Reference34 articles.
1. PERSPECTIVE:SIGN EPISTASIS AND GENETIC CONSTRAINT ON EVOLUTIONARY TRAJECTORIES
2. Zhizhou Ren , Jiahan Li , Fan Ding , Yuan Zhou , Jianzhu Ma , and Jian Peng . Proximal exploration for model-guided protein sequence design. In International Conference on Machine Learning, pages 18520–18536. PMLR, 2022.
3. Design by Directed Evolution
4. Methods for the directed evolution of proteins
5. Deep mutational scanning: a new style of protein science
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献