Capturing Protein Domain Structure and Function Using Self-Supervision on Domain Architectures-Reference-Cited by-同舟云学术

Capturing Protein Domain Structure and Function Using Self-Supervision on Domain Architectures

Published:2021-01-19 Issue:1 Volume:14 Page:28
ISSN:1999-4893
Container-title:Algorithms
language:en
Short-container-title:Algorithms

Author:

Melidis Damianos P.^ORCID,Nejdl Wolfgang

Abstract

Predicting biological properties of unseen proteins is shown to be improved by the use of protein sequence embeddings. However, these sequence embeddings have the caveat that biological metadata do not exist for each amino acid, in order to measure the quality of each unique learned embedding vector separately. Therefore, current sequence embedding cannot be intrinsically evaluated on the degree of their captured biological information in a quantitative manner. We address this drawback by our approach, dom2vec, by learning vector representation for protein domains and not for each amino acid base, as biological metadata do exist for each domain separately. To perform a reliable quantitative intrinsic evaluation in terms of biology knowledge, we selected the metadata related to the most distinctive biological characteristics of a domain, which are its structure, enzymatic, and molecular function. Notably, dom2vec obtains an adequate level of performance in the intrinsic assessment—therefore, we can draw an analogy between the local linguistic features in natural languages and the domain structure and function information in domain architectures. Moreover, we demonstrate the dom2vec applicability on protein prediction tasks, by comparing it with state-of-the-art sequence embeddings in three downstream tasks. We show that dom2vec outperforms sequence embeddings for toxin and enzymatic function prediction and is comparable with sequence embeddings in cellular location prediction.

Funder

Niedersächsisches Ministerium für Wissenschaft und Kultur

Publisher

MDPI AG

Subject

Computational Mathematics,Computational Theory and Mathematics,Numerical Analysis,Theoretical Computer Science

Link

https://www.mdpi.com/1999-4893/14/1/28/pdf

Reference43 articles.

1. Arrangements in the modular evolution of proteins

2. Evolution of protein domain architectures;Forslund,2012

3. Using Functional Domain Composition and Support Vector Machines for Prediction of Protein Subcellular Location

4. Predicting protein function from domain content

5. UniProt-DAAC: domain architecture alignment and classification, a new method for automatic functional annotation in UniProtKB

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. SonicParanoid2: fast, accurate, and comprehensive orthology inference with machine learning and language models;Genome Biology;2024-07-25

2. In vitro continuous protein evolution empowered by machine learning and automation;Cell Systems;2023-08

3. An immuno-informatics approach for annotation of hypothetical proteins and multi-epitope vaccine designed against the Mpox virus;Journal of Biomolecular Structure and Dynamics;2023-07-31

4. SonicParanoid2: fast, accurate, and comprehensive orthology inference with machine learning and language models;2023-05-15