Now What Sequence? Pre-trained Ensembles for Bayesian Optimization of Protein Sequences-Reference-Cited by-同舟云学术

Now What Sequence? Pre-trained Ensembles for Bayesian Optimization of Protein Sequences

Published:2022-08-06 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Yang Ziyue^ORCID,Milas Katarina A.^ORCID,White Andrew D.^ORCID

Abstract

Pre-trained models have been transformative in natural language, computer vision, and now protein sequences by enabling accuracy with few training examples. We show how to use pre-trained sequence models in Bayesian optimization to design new protein sequences with minimal labels (i.e., few experiments). Pre-trained models give good predictive accuracy at low data and Bayesian optimization guides the choice of which sequences to test. Pre-trained sequence models also remove the common requirement of having a list of possible experiments. Any sequence can be considered. We show significantly fewer labeled sequences are required for three sequence design tasks, including creating novel peptide inhibitors with AlphaFold. These de novo peptide inhibitors require only sequence information, no known protein-protein structures, and we can predict highly-efficient binders with less than 10 AlphaFold calculations.

Publisher

Cold Spring Harbor Laboratory

Reference87 articles.

1. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences

2. Language models enable zero-shot prediction of the effects of mutations on protein function

3. Unified rational protein engineering with sequence-based deep representation learning;Nature Methods,2019

4. A. Elnaggar , M. Heinzinger , C. Dallago , G. Rihawi , Y. Wang , L. Jones , T. Gibbs , T. Feher , C. Angerer , M. Steinegger , et al., Prottrans: towards cracking the language of life’s code through self-supervised deep learning and high performance computing, arXiv preprint arXiv:2007.06225 (2020).

5. Structure-based design of inhibitors of protein– protein interactions: mimicking peptide binding epitopes;Angewandte Chemie International Edition,2015

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Design of intrinsically disordered protein variants with diverse structural properties;Science Advances;2024-08-30

2. Sample-efficient Antibody Design through Protein Language Model for Risk-aware Batch Bayesian Optimization;2023-11-08

3. Accelerating Biocatalysis Discovery with Machine Learning: A Paradigm Shift in Enzyme Engineering, Discovery, and Design;ACS Catalysis;2023-10-26

4. Design of intrinsically disordered protein variants with diverse structural properties;2023-10-24

5. Bayesian Optimization in Drug Discovery;Methods in Molecular Biology;2023-09-14