GO2Vec: transforming GO terms and proteins to vector representations via graph embeddings-Reference-Cited by-同舟云学术

GO2Vec: transforming GO terms and proteins to vector representations via graph embeddings

Published:2019-12 Issue:S9 Volume:20 Page:
ISSN:1471-2164
Container-title:BMC Genomics
language:en
Short-container-title:BMC Genomics

Author:

Zhong Xiaoshi^ORCID,Kaalia Rama,Rajapakse Jagath C.

Abstract

Abstract Background Semantic similarity between Gene Ontology (GO) terms is a fundamental measure for many bioinformatics applications, such as determining functional similarity between genes or proteins. Most previous research exploited information content to estimate the semantic similarity between GO terms; recently some research exploited word embeddings to learn vector representations for GO terms from a large-scale corpus. In this paper, we proposed a novel method, named GO2Vec, that exploits graph embeddings to learn vector representations for GO terms from GO graph. GO2Vec combines the information from both GO graph and GO annotations, and its learned vectors can be applied to a variety of bioinformatics applications, such as calculating functional similarity between proteins and predicting protein-protein interactions. Results We conducted two kinds of experiments to evaluate the quality of GO2Vec: (1) functional similarity between proteins on the Collaborative Evaluation of GO-based Semantic Similarity Measures (CESSM) dataset and (2) prediction of protein-protein interactions on the Yeast and Human datasets from the STRING database. Experimental results demonstrate the effectiveness of GO2Vec over the information content-based measures and the word embedding-based measures. Conclusion Our experimental results demonstrate the effectiveness of using graph embeddings to learn vector representations from undirected GO and GOA graphs. Our results also demonstrate that GO annotations provide useful information for computing the similarity between GO terms and between proteins.

Publisher

Springer Science and Business Media LLC

Subject

Genetics,Biotechnology

Link

http://link.springer.com/content/pdf/10.1186/s12864-019-6272-2.pdf

Reference43 articles.

1. Consortium GO. The gene ontology (go) database and informatics resource. Nucleic Acids Res. 2004; 32:258–61.

2. Stein LD, Mungall C, Shu S, Caudy M, Mangone M, Day A, Nickerson E, Stajich JE, Harris TW, Arva A, Lewis S. The generic genome brower: A building block for a model organism system database. Genome Res. 2002; 12:1599–610.

3. Consortium U. Uniprot: a hub for protein information. Nucleic Acids Res. 2014; 43(D1):204–12.

4. Kriventseva EV, Fleischmann W, Zdobnov EM, Apweiler R. Clustr: a database of clusters of swiss-prot+trembl proteins. Nucleic Acids Res. 2001; 29(1):33–36.

5. Jiang JJ, Conrath DW. Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of the 10th International Conference on Computational Linguistics. Taipei: The Association for Computational Linguistics and Chinese Language Processing (ACLCLP): 1997. p. 19–33.

Cited by 20 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Protein feature engineering framework for AMPylation site prediction;Scientific Reports;2024-04-15

2. Explaining protein–protein interactions with knowledge graph-based semantic similarity;Computers in Biology and Medicine;2024-03

3. Partial order relation–based gene ontology embedding improves protein function prediction;Briefings in Bioinformatics;2024-01-22

4. Deep learning‐assisted prediction of protein–protein interactions in Arabidopsis thaliana;The Plant Journal;2023-03-29

5. Time expression recognition and normalization: a survey;Artificial Intelligence Review;2023-01-24