Using homology relations within a database markedly boosts protein sequence similarity search-Reference-Cited by-同舟云学术

Using homology relations within a database markedly boosts protein sequence similarity search

Published:2015-05-18 Issue:22 Volume:112 Page:7003-7008
ISSN:0027-8424
Container-title:Proceedings of the National Academy of Sciences
language:en
Short-container-title:Proc Natl Acad Sci USA

Author:

Tong Jing,Sadreyev Ruslan I.,Pei Jimin,Kinch Lisa N.,Grishin Nick V.

Abstract

Inference of homology from protein sequences provides an essential tool for analyzing protein structure, function, and evolution. Current sequence-based homology search methods are still unable to detect many similarities evident from protein spatial structures. In computer science a search engine can be improved by considering networks of known relationships within the search database. Here, we apply this idea to protein-sequence–based homology search and show that it dramatically enhances the search accuracy. Our new method, COMPADRE (COmparison of Multiple Protein sequence Alignments using Database RElationships) assesses the relationship between the query sequence and a hit in the database by considering the similarity between the query and hit’s known homologs. This approach increases detection quality, boosting the precision rate from 18% to 83% at half-coverage of all database homologs. The increased precision rate allows detection of a large fraction of protein structural relationships, thus providing structure and function predictions for previously uncharacterized proteins. Our results suggest that this general approach is applicable to a wide variety of methods for detection of biological similarities. The web server is available at prodata.swmed.edu/compadre.

Funder

HHS | National Institutes of Health

Welch Foundation

Publisher

Proceedings of the National Academy of Sciences

Subject

Multidisciplinary

Reference32 articles.

1. The Next-Generation Sequencing Revolution and Its Impact on Genomics

2. Next-Generation Sequencing Platforms

3. Profile analysis: detection of distantly related proteins.

4. Assessment of template-based protein structure predictions in CASP10

5. On the origin and highly likely completeness of single-domain protein structures

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Identification of Two Flip-Over Genes in Grass Family as Potential Signature of C4 Photosynthesis Evolution;International Journal of Molecular Sciences;2023-09-15

2. Molecular cloning, characterization, and expression analysis of TIPE1 in chicken (Gallus gallus): Its applications in fatty liver hemorrhagic syndrome;International Journal of Biological Macromolecules;2022-05

3. A Novel Long- and Short-Term Memory Network with Time Series Data Analysis Capabilities;Mathematical Problems in Engineering;2020-10-13

4. UBTOR/KIAA1024 regulates neurite outgrowth and neoplasia through mTOR signaling;PLOS Genetics;2018-08-06

5. A low-complexity add-on score for protein remote homology search with COMER;Bioinformatics;2018-01-30