ProteinGym: Large-Scale Benchmarks for Protein Design and Fitness Prediction-Reference-Cited by-同舟云学术

ProteinGym: Large-Scale Benchmarks for Protein Design and Fitness Prediction

Published:2023-12-08 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Notin Pascal^ORCID,Kollasch Aaron W.^ORCID,Ritter Daniel^ORCID,van Niekerk Lood^ORCID,Paul Steffanie^ORCID,Spinner Hansen,Rollins Nathan^ORCID,Shaw Ada^ORCID,Weitzman Ruben^ORCID,Frazer Jonathan^ORCID,Dias Mafalda^ORCID,Franceschi Dinko,Orenbuch Rose^ORCID,Gal Yarin^ORCID,Marks Debora S.^ORCID

Abstract

AbstractPredicting the effects of mutations in proteins is critical to many applications, from understanding genetic disease to designing novel proteins that can address our most pressing challenges in climate, agriculture and healthcare. Despite a surge in machine learning-based protein models to tackle these questions, an assessment of their respective benefits is challenging due to the use of distinct, often contrived, experimental datasets, and the variable performance of models across different protein families. Addressing these challenges requires scale. To that end we introduce ProteinGym, a large-scale and holistic set of benchmarks specifically designed for protein fitness prediction and design. It encompasses both a broad collection of over 250 standardized deep mutational scanning assays, spanning millions of mutated sequences, as well as curated clinical datasets providing high-quality expert annotations about mutation effects. We devise a robust evaluation framework that combines metrics for both fitness prediction and design, factors in known limitations of the underlying experimental methods, and covers both zero-shot and supervised settings. We report the performance of a diverse set of over 70 high-performing models from various subfields (eg., alignment-based, inverse folding) into a unified benchmark suite. We open source the corresponding codebase, datasets, MSAs, structures, model predictions and develop a user-friendly website that facilitates data access and analysis.

Publisher

Cold Spring Harbor Laboratory

Reference230 articles.

1. Evolving New Protein-Protein Interaction Specificity through Promiscuous Intermediates

2. Protein Model Discrimination Using Mutational Sensitivity Derived from Deep Sequencing

3. A method and server for predicting damaging missense mutations

4. A Combined Approach Reveals a Regulatory Mechanism Coupling Src’s Kinase Activity, Localization, and Phosphotransferase-Independent Functions

5. ClinPred: Prediction Tool to Identify Disease-Relevant Nonsynonymous Single-Nucleotide Variants

Cited by 41 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. AI-accelerated therapeutic antibody development: practical insights;Frontiers in Drug Discovery;2024-09-03

2. Protein stability models fail to capture epistatic interactions of double point mutations;2024-08-21

3. Unlocking Protein Evolution Insights: Efficient and Interpretable Mutational Effect Predictions with GEMME;2024-08-17

4. Results of the Protein Engineering Tournament: An Open Science Benchmark for Protein Modeling and Design;2024-08-12

5. Adapting protein language models for structure-conditioned design;2024-08-03