Benchmarking Uncertainty Quantification for Protein Engineering-Reference-Cited by-同舟云学术

Benchmarking Uncertainty Quantification for Protein Engineering

Published:2023-04-18 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Greenman Kevin P.^ORCID,Amini Ava P.^ORCID,Yang Kevin K.^ORCID

Abstract

AbstractMachine learning sequence-function models for proteins could enable significant ad vances in protein engineering, especially when paired with state-of-the-art methods to select new sequences for property optimization and/or model improvement. Such methods (Bayesian optimization and active learning) require calibrated estimations of model uncertainty. While studies have benchmarked a variety of deep learning uncertainty quantification (UQ) methods on standard and molecular machine-learning datasets, it is not clear if these results extend to protein datasets. In this work, we implemented a panel of deep learning UQ methods on regression tasks from the Fitness Landscape Inference for Proteins (FLIP) benchmark. We compared results across different degrees of distributional shift using metrics that assess each UQ method’s accuracy, calibration, coverage, width, and rank correlation. Additionally, we compared these metrics using one-hot encoding and pretrained language model representations, and we tested the UQ methods in a retrospective active learning setting. These benchmarks enable us to provide recommendations for more effective design of biological sequences using machine learning.

Publisher

Cold Spring Harbor Laboratory

Reference37 articles.

1. Machine-learning-guided directed evolution for protein engineering;Nature methods,2019

2. Kendall, A. , and Gal, Y . (2017) What uncertainties do we need in bayesian deep learning for computer vision? Advances in Neural Information Processing Systems 30 .

3. Dallago, C. , Mou, J. , Johnston, K. E. , Wittmann, B. , Bhattacharya, N. , Goldman, S. , Madani, A. , and Yang, K. K. FLIP: Benchmark tasks in fitness landscape inference for proteins. Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2). 2021.

4. Evaluating scalable uncertainty estimation methods for deep learning-based molecular property prediction;Journal of chemical information and modeling,2020

5. Methods for comparing uncertainty quantifications for material property predictions;Machine Learning: Science and Technology,2020

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Opportunities and Challenges for Machine Learning-Assisted Enzyme Engineering;ACS Central Science;2024-02-05

2. Coherent Blending of Biophysics-Based Knowledge with Bayesian Neural Networks for Robust Protein Property Prediction;ACS Synthetic Biology;2023-10-27

3. Accelerating Biocatalysis Discovery with Machine Learning: A Paradigm Shift in Enzyme Engineering, Discovery, and Design;ACS Catalysis;2023-10-26

4. Linear-Scaling Kernels for Protein Sequences and Small Molecules Outperform Deep Learning While Providing Uncertainty Quantitation and Improved Interpretability;Journal of Chemical Information and Modeling;2023-07-27