Abstract
ABSTRACTProtein sequence likelihood models (PSLMs) are an emerging class of self-supervised deep learning algorithms which learn distributions over amino acid identities in structural and evolutionary contexts. Recently, PSLMs have demonstrated impressive performance in predicting the relative fitness of variant sequences without any task-specific training. In this work, we comprehensively analyze the capacity of six PSLMs to predict experimental measurements of thermostability for variants of hundreds of heterogeneous proteins. We assess performance of PSLMs relative to state-of-the-art supervised models, highlight relative strengths and weaknesses, and examine the complementarity between these models. We focus our analyses on stability engineering applications, assessing which methods and combinations of methods can most consistently identify and prioritize mutations for experimental validation. Our results indicate that structure-based PSLMs have competitive performance with the best existing supervised methods and can augment the predictions of supervised methods by integrating insights from their disparate training objectives.
Publisher
Cold Spring Harbor Laboratory
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献