Abstract
Pre-trained models have been transformative in natural language, computer vision, and now protein sequences by enabling accuracy with few training examples. We show how to use pre-trained sequence models in Bayesian optimization to design new protein sequences with minimal labels (i.e., few experiments). Pre-trained models give good predictive accuracy at low data and Bayesian optimization guides the choice of which sequences to test. Pre-trained sequence models also remove the common requirement of having a list of possible experiments. Any sequence can be considered. We show significantly fewer labeled sequences are required for three sequence design tasks, including creating novel peptide inhibitors with AlphaFold. These de novo peptide inhibitors require only sequence information, no known protein-protein structures, and we can predict highly-efficient binders with less than 10 AlphaFold calculations.
Publisher
Cold Spring Harbor Laboratory
Cited by
9 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献