Large-scale design and refinement of stable proteins using sequence-only models-Reference-Cited by-同舟云学术

Large-scale design and refinement of stable proteins using sequence-only models

Published:2022-03-14 Issue:3 Volume:17 Page:e0265020
ISSN:1932-6203
Container-title:PLOS ONE
language:en
Short-container-title:PLoS ONE

Author:

Singer Jedediah M.^ORCID,Novotney Scott,Strickland Devin,Haddox Hugh K.,Leiby Nicholas,Rocklin Gabriel J.,Chow Cameron M.^ORCID,Roy Anindya,Bera Asim K.,Motta Francis C.^ORCID,Cao Longxing,Strauch Eva-Maria,Chidyausiku Tamuka M.^ORCID,Ford Alex,Ho Ethan^ORCID,Zaitzeff Alexander,Mackenzie Craig O.,Eramian Hamed,DiMaio Frank,Grigoryan Gevorg,Vaughn Matthew^ORCID,Stewart Lance J.,Baker David,Klavins Eric

Abstract

Engineered proteins generally must possess a stable structure in order to achieve their designed function. Stable designs, however, are astronomically rare within the space of all possible amino acid sequences. As a consequence, many designs must be tested computationally and experimentally in order to find stable ones, which is expensive in terms of time and resources. Here we use a high-throughput, low-fidelity assay to experimentally evaluate the stability of approximately 200,000 novel proteins. These include a wide range of sequence perturbations, providing a baseline for future work in the field. We build a neural network model that predicts protein stability given only sequences of amino acids, and compare its performance to the assayed values. We also report another network model that is able to generate the amino acid sequences of novel stable proteins given requested secondary sequences. Finally, we show that the predictive model—despite weaknesses including a noisy data set—can be used to substantially increase the stability of both expert-designed and model-generated proteins.

Funder

Defense Advanced Research Projects Agency

Publisher

Public Library of Science (PLoS)

Subject

Multidisciplinary

Reference79 articles.

1. Massively parallel de novo protein design for targeted therapeutics;A Chevalier;Nature,2017

2. De Novo Computational Design of Retro-Aldol Enzymes;L Jiang;Science,2008

3. Computational Design of Self-Assembling Protein Nanomaterials with Atomic Level Accuracy;NP King;Science,2012

4. The Rosetta all-atom energy function for macromolecular modeling and design;RF Alford;Journal of chemical theory and computation,2017

5. Protein stability: computation, sequence statistics, and new experimental methods;TJ Magliery;Current Opinion in Structural Biology,2015

Cited by 17 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Protein Language Models in Directed Evolution;2024-08-20

2. Optimisation strategies for directed evolution without sequencing;2024-03-19

3. Learning-Based Estimation of Fitness Landscape Ruggedness for Directed Evolution;2024-03-02

4. Tpgen: a language model for stable protein design with a specific topology structure;BMC Bioinformatics;2024-01-23

5. A new age in protein design empowered by deep learning;Cell Systems;2023-11