Enhancing predictions of protein stability changes induced by single mutations using MSA-based Language Models-Reference-Cited by-同舟云学术

Enhancing predictions of protein stability changes induced by single mutations using MSA-based Language Models

Published:2024-04-14 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Cuturello Francesca^ORCID,Celoria Marco^ORCID,Ansuini Alessio^ORCID,Cazzaniga Alberto^ORCID

Abstract

AbstractProtein Language Models offer a new perspective for addressing challenges in structural biology, while relying solely on sequence information. Recent studies have investigated their effectiveness in forecasting shifts in thermodynamic stability caused by single amino acid mutations, a task known for its complexity due to the sparse availability of data, constrained by experimental limitations. To tackle this problem, we introduce two key novelties: leveraging a Protein Language Model that incorporates Multiple Sequence Alignments to capture evolutionary information, and using a recently released mega-scale dataset with rigorous data pre-processing to mitigate overfitting. We ensure comprehensive comparisons by fine-tuning various pre-trained models, taking advantage of analyses such as ablation studies and baselines evaluation. Our methodology introduces a stringent policy to reduce the widespread issue of data leakage, rigorously removing sequences from the training set when they exhibit significant similarity with the test set. The MSA Transformer emerges as the most accurate among the models under investigation, given its capability to leverage co-evolution signals encoded in aligned homologous sequences. Moreover, the optimized MSA Transformer outperforms existing methods and exhibits enhanced generalization power, leading to a notable improvement in predicting changes in protein stability resulting from point mutations. Code and data are available athttps://github.com/RitAreaSciencePark/PLM4Muts.

Publisher

Cold Spring Harbor Laboratory

Reference72 articles.

1. Ashish Vaswani , Noam Shazeer , Niki Parmar , Jakob Uszkoreit , Llion Jones , Aidan N Gomez , Lukasz Kaiser , and Illia Polosukhin . Attention is all you need. Advances in neural information processing systems, 30, 2017.

2. Bertology meets biology: interpreting attention in protein language models;arXiv preprint,2020

3. Learning the protein language: Evolution, structure, and function;Cell systems,2021

4. Learning meaningful representations of protein sequences;Nature communications,2022

5. Lucrezia Valeriani , Diego Doimo , Francesca Cuturello , Alessandro Laio , Alessio Ansuini , and Alberto Cazzaniga . The geometry of hidden representations of large transformer models. Advances in Neural Information Processing Systems, 36, 2024.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Aligning protein generative models with experimental fitness via Direct Preference Optimization;2024-05-21