Abstract
AbstractThe stability of a protein is crucial to its utility in industrial applications. While engineering campaigns can now be routinely used to enhance protein thermal stability to the level needed in an industrial setting, there is a significant desire to fast-track these efforts through predictive tools allowing one to jump in a minimal number of design iterations to a highly stabilized protein. In this work, we explore utilizing a mega-scale dataset for development of a protein language model tuned for stability. This model is trained on the folding stability of 528k sequences derived from 461 small protein domains and designs, and can accommodate deletions, insertions, and multiple-point mutations. We show that a protein language model can be fine-tuned to predict folding stability. The fine-tuned protein language model, named ESMtherm, performs reasonably on small protein domains and generalizes to sequences distal from the training set. Lastly, we discuss its limitations when compared to other state-of-the-art methods in generalizing to larger protein scaffolds and highlight the need of large-scale stability measurement on a diverse dataset that represents the distribution of sequence lengths commonly observed in nature.
Publisher
Cold Spring Harbor Laboratory
Reference43 articles.
1. Role of conformational sampling in computing mutation-induced changes in protein structure and stability;Proteins: Structure, Function and Bioinformatics,2011
2. Simultaneous Optimization of Biomolecular Energy Functions on Features from Small Molecules and Macromolecules;Journal of Chemical Theory and Computation,2016
3. The FoldX web server: an online force field
4. KEAP1 cancer mutants: a large-scale molecular dynamics study of protein stability;International journal of molecular sciences,2021
5. Dehouck, Y. , Kwasigroch, J. M. , Gilis, D. , & Rooman, M. (2011). PoPMuSiC 2.1: A web server for the estimation of protein stability changes upon mutation and sequence optimality. BMC Bioinformatics, 12.
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献