Abstract
AbstractMachine learning sequence-function models for proteins could enable significant ad vances in protein engineering, especially when paired with state-of-the-art methods to select new sequences for property optimization and/or model improvement. Such methods (Bayesian optimization and active learning) require calibrated estimations of model uncertainty. While studies have benchmarked a variety of deep learning uncertainty quantification (UQ) methods on standard and molecular machine-learning datasets, it is not clear if these results extend to protein datasets. In this work, we implemented a panel of deep learning UQ methods on regression tasks from the Fitness Landscape Inference for Proteins (FLIP) benchmark. We compared results across different degrees of distributional shift using metrics that assess each UQ method’s accuracy, calibration, coverage, width, and rank correlation. Additionally, we compared these metrics using one-hot encoding and pretrained language model representations, and we tested the UQ methods in a retrospective active learning setting. These benchmarks enable us to provide recommendations for more effective design of biological sequences using machine learning.
Publisher
Cold Spring Harbor Laboratory
Reference37 articles.
1. Machine-learning-guided directed evolution for protein engineering;Nature methods,2019
2. Kendall, A. , and Gal, Y . (2017) What uncertainties do we need in bayesian deep learning for computer vision? Advances in Neural Information Processing Systems 30 .
3. Dallago, C. , Mou, J. , Johnston, K. E. , Wittmann, B. , Bhattacharya, N. , Goldman, S. , Madani, A. , and Yang, K. K. FLIP: Benchmark tasks in fitness landscape inference for proteins. Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2). 2021.
4. Evaluating scalable uncertainty estimation methods for deep learning-based molecular property prediction;Journal of chemical information and modeling,2020
5. Methods for comparing uncertainty quantifications for material property predictions;Machine Learning: Science and Technology,2020
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献