Protein language models can capture protein quaternary state-Reference-Cited by-同舟云学术

Protein language models can capture protein quaternary state

Published:2023-04-02 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Avraham Orly^ORCID,Tsaban Tomer^ORCID,Ben-Aharon Ziv^ORCID,Tsaban Linoy^ORCID,Schueler-Furman Ora^ORCID

Abstract

AbstractBackgroundDetermining a protein’s quaternary state,i.e. how many monomers assemble together to form the functioning unit, is a critical step in protein characterization, and deducing it is not trivial. Many proteins form multimers for their activity, and over 50% are estimated to naturally form homomultimers. Experimental quaternary state determination can be challenging and require extensive work. To complement these efforts, a number of computational tools have been developed for quaternary state prediction, often utilizing experimentally validated structural information. Recently, dramatic advances have been made in the field of deep learning for predicting protein structure and other characteristics. Protein language models that apply computational natural-language models to proteins successfully capture secondary structure, protein cell localization and other characteristics, from a single sequence. Here we hypothesize that information about the protein quaternary state may be contained within protein sequences as well, allowing us to benefit from these novel approaches in the context of quaternary state prediction.ResultsWe generated embeddings for a large dataset of quaternary state labels, extracted from the curated QSbio dataset. We then trained a model for quaternary state classification and assessed it on a non-overlapping set of distinct folds (ECOD family level). Our model, named QUEEN (QUaternary state prediction using dEEp learNing), performs worse than approaches that include information from solved crystal structures. However, we show that it successfully learned to distinguish multimers from monomers, and that the specific quaternary state is predicted with moderate success, better than a simple model that transfers annotation based on sequence similarity. Our results demonstrate that complex, quaternary state related information is included in these embeddings.ConclusionsQUEEN is the first to investigate the power of embeddings for the prediction of the quaternary state of proteins. As such, it lays out the strength as well as limitations of a sequence-based protein language model approach compared to structure-based approaches. Since it does not require any structural information and is fast, we anticipate that it will be of wide use both for in-depth investigation of specific systems, as well as for studies of large sets of protein sequences. A simple colab implementation is available at:https://colab.research.google.com/github/Orly-A/QUEEN_prediction/blob/main/QUEEN_prediction_notebook.ipynb.

Publisher

Cold Spring Harbor Laboratory

Reference39 articles.

1. Structural Symmetry and Protein Function

2. Protein quaternary structures in solution are a mixture of multiple forms;Chem Sci,2022

3. Three-dimensional structure of beta-galactosidase from E;coli. Nature,1994

4. HTRA1 Mutations Identified in Symptomatic Carriers Have the Property of Interfering the Trimer-Dependent Activation Cascade;Front Neurol,2019

5. Gene Ontology: tool for the unification of biology

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Democratizing Protein Language Models with Parameter-Efficient Fine-Tuning;2023-11-10