ProDis-ContSHC: learning protein dissimilarity measures and hierarchical context coherently for protein-protein comparison in protein database retrieval-Reference-Cited by-同舟云学术

ProDis-ContSHC: learning protein dissimilarity measures and hierarchical context coherently for protein-protein comparison in protein database retrieval

Published:2012-05-08 Issue:S7 Volume:13 Page:
ISSN:1471-2105
Container-title:BMC Bioinformatics
language:en
Short-container-title:BMC Bioinformatics

Author:

Wang Jingyan,Gao Xin,Wang Quanquan,Li Yongping

Abstract

Abstract Background The need to retrieve or classify protein molecules using structure or sequence-based similarity measures underlies a wide range of biomedical applications. Traditional protein search methods rely on a pairwise dissimilarity/similarity measure for comparing a pair of proteins. This kind of pairwise measures suffer from the limitation of neglecting the distribution of other proteins and thus cannot satisfy the need for high accuracy of the retrieval systems. Recent work in the machine learning community has shown that exploiting the global structure of the database and learning the contextual dissimilarity/similarity measures can improve the retrieval performance significantly. However, most existing contextual dissimilarity/similarity learning algorithms work in an unsupervised manner, which does not utilize the information of the known class labels of proteins in the database. Results In this paper, we propose a novel protein-protein dissimilarity learning algorithm, ProDis-ContSHC. ProDis-ContSHC regularizes an existing dissimilarity measure d ij by considering the contextual information of the proteins. The context of a protein is defined by its neighboring proteins. The basic idea is, for a pair of proteins (i, j), if their context N ( i ) and N ( j ) is similar to each other, the two proteins should also have a high similarity. We implement this idea by regularizing d ij by a factor learned from the context N ( i ) and N ( j ) . Moreover, we divide the context to hierarchial sub-context and get the contextual dissimilarity vector for each protein pair. Using the class label information of the proteins, we select the relevant (a pair of proteins that has the same class labels) and irrelevant (with different labels) protein pairs, and train an SVM model to distinguish between their contextual dissimilarity vectors. The SVM model is further used to learn a supervised regularizing factor. Finally, with the new S upervised learned Dis similarity measure, we update the Pro tein H ierarchial Cont ext C oherently in an iterative algorithm--ProDis-ContSHC. We test the performance of ProDis-ContSHC on two benchmark sets, i.e., the ASTRAL 1.73 database and the FSSP/DALI database. Experimental results demonstrate that plugging our supervised contextual dissimilarity measures into the retrieval systems significantly outperforms the context-free dissimilarity/similarity measures and other unsupervised contextual dissimilarity measures that do not use the class label information. Conclusions Using the contextual proteins with their class labels in the database, we can improve the accuracy of the pairwise dissimilarity/similarity measures dramatically for the protein retrieval tasks. In this work, for the first time, we propose the idea of supervised contextual dissimilarity learning, resulting in the ProDis-ContSHC algorithm. Among different contextual dissimilarity learning approaches that can be used to compare a pair of proteins, ProDis-ContSHC provides the highest accuracy. Finally, ProDis-ContSHC compares favorably with other methods reported in the recent literature.

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Link

https://link.springer.com/content/pdf/10.1186/1471-2105-13-S7-S2.pdf

Reference70 articles.

1. Chen SA, Lee TY, Ou YY: Incorporating significant amino acid pairs to identify O-linked glycosylation sites on transmembrane proteins and non-transmembrane proteins. BMC Bioinformatics 2010, 11: 536. 10.1186/1471-2105-11-536

2. Sobolev B, Filimonov D, Lagunin A, Zakharov A, Koborova O, Kel A, Poroikov V: Functional classification of proteins based on projection of amino acid sequences: application for prediction of protein kinase substrates. BMC Bioinformatics 2010, 11: 313. 10.1186/1471-2105-11-313

3. Albayrak A, Otu HH, Sezerman UO: Clustering of protein families into functional subtypes using Relative Complexity Measure with reduced amino acid alphabets. BMC Bioinformatics 2010, 11: 428. 10.1186/1471-2105-11-428

4. Ezkurdia L, Bartoli L, Fariselli P, Casadio R, Valencia A, Tress ML: Progress and challenges in predicting protein-protein interaction sites. Brief Bioinform 2009, 10(3):233–246.

5. Cook T, Sutton R, Buckley K: Automated flexion crease identification using internal image seams. Pattern Recognition 2010, 43(3):630–635. 10.1016/j.patcog.2009.08.012

Cited by 25 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Random forest method for predicting protein ligand–binding residues;Computational Intelligence in Protein-Ligand Interaction Analysis;2024

2. From Restricted Equivalence Functions on $L^{n}$ to Similarity Measures Between Fuzzy Multisets;IEEE Transactions on Fuzzy Systems;2023-08

3. A New Similarity Space Tailored for Supervised Deep Metric Learning;ACM Transactions on Intelligent Systems and Technology;2022-11-09

4. Metric Learning via Penalized Optimization;Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining;2021-08-14

5. Large-Scale Multi-modal Distance Metric Learning with Application to Content-Based Information Retrieval and Image Classification;International Journal of Pattern Recognition and Artificial Intelligence;2020-05-26