Machine Learning to Predict Continuous Protein Properties from Simple Binary Sorting and Deep Sequencing Data-Reference-Cited by-同舟云学术

Machine Learning to Predict Continuous Protein Properties from Simple Binary Sorting and Deep Sequencing Data

Published:2023-06-10 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Case Marshall^ORCID,Smith Matthew^ORCID,Vinh Jordan,Thurber Greg^ORCID

Abstract

AbstractProteins are a diverse class of biomolecules responsible for wide-ranging cellular functions, from catalyzing reactions and recognizing pathogens to forming dynamic cellular structure. The ability to evolve proteins rapidly and inexpensively towards improved properties is a common objective for protein engineers. Powerful high-throughput methods like fluorescent activated cell sorting (FACS) and next-generation sequencing (NGS) have dramatically improved directed evolution experiments. However, it is unclear how to best leverage this data to characterize protein fitness landscapes more completely and identify lead candidates. In this work, we develop a simple yet powerful framework to improve protein optimization by predicting continuous protein properties from simple directed evolution experiments using interpretable machine learning. Evaluated across five diverse protein engineering tasks, continuous properties are consistently predicted from readily available deep sequencing data. To prospectively test the utility of this approach, we generated a library of stapled peptides and applied the framework to predict and optimize both affinity and specificity. We coupled integer linear programming with the interpretable machine learning model coefficients to identify new variants from experimentally unseen sequence space that have desired properties. This approach represents a versatile tool for improved analysis and identification of protein variants across many domains of protein engineering.

Publisher

Cold Spring Harbor Laboratory

Reference64 articles.

1. Principles that Govern the Folding of Protein Chains

2. Directed evolution: Past, present, and future;AIChE Journal,2013

3. Evolution of a Catabolic Pathway in Bacteria

4. Roberts, R. W. & Szostak, J. W. RNA-peptide fusions for the in vitro selection of peptides and proteins. Biochemistry vol. 94 www.pnas.org. (1997).

5. Filamentous Fusion Phage: Novel Expression Vectors That Display Cloned Antigens on the Virion Surface;Science (1979),1984

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Machine learning to predict continuous protein properties from binary cell sorting data and map unseen sequence space;Proceedings of the National Academy of Sciences;2024-03-07

2. Position-Specific Enrichment Ratio Matrix scores predict antibody variant properties from deep sequencing data;Bioinformatics;2023-07-21

3. Position-Specific Enrichment Ratio Matrix scores predict antibody variant properties from deep sequencing data;2023-07-11