Dynamics of domain coverage of the protein sequence universe-Reference-Cited by-同舟云学术

Dynamics of domain coverage of the protein sequence universe

Published:2012-11-16 Issue:1 Volume:13 Page:
ISSN:1471-2164
Container-title:BMC Genomics
language:en
Short-container-title:BMC Genomics

Author:

Rekapalli Bhanu,Wuichet Kristin,Peterson Gregory D,Zhulin Igor B

Abstract

Abstract Background The currently known protein sequence space consists of millions of sequences in public databases and is rapidly expanding. Assigning sequences to families leads to a better understanding of protein function and the nature of the protein universe. However, a large portion of the current protein space remains unassigned and is referred to as its “dark matter”. Results Here we suggest that true size of “dark matter” is much larger than stated by current definitions. We propose an approach to reducing the size of “dark matter” by identifying and subtracting regions in protein sequences that are not likely to contain any domain. Conclusions Recent improvements in computational domain modeling result in a decrease, albeit slowly, in the relative size of “dark matter”; however, its absolute size increases substantially with the growth of sequence data.

Publisher

Springer Science and Business Media LLC

Subject

Genetics,Biotechnology

Link

https://link.springer.com/content/pdf/10.1186/1471-2164-13-634.pdf

Reference35 articles.

1. Levitt M: Nature of the protein universe. Proc Natl Acad Sci USA. 2009, 106: 11079-11084. 10.1073/pnas.0905029106.

2. Koonin EV, Wolf Y, Karev GP: The structure of the protein universe and genome evolution. Nature. 2002, 420: 218-223. 10.1038/nature01256.

3. Chothia C, Gough J, Vogel C, Teichmann SA: Evolution of the protein repertoire. Science. 2003, 300: 1701-1703. 10.1126/science.1085371.

4. Shendure J, Ji H: Next-generation DNA sequencing. Nat Biotech. 2008, 26: 1135-1145. 10.1038/nbt1486.

5. Kahn SD: On the future of genomic data. Science. 2011, 331: 728-729. 10.1126/science.1197891.

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Protein classification based on graphical encoding and convolutional neural network;2023 3rd International Conference on Electronic Engineering (ICEEM);2023-10-07

2. Gene Ontology Capsule GAN: an improved architecture for protein function prediction;PeerJ Computer Science;2022-08-15

3. DeepFunc: A Deep Learning Framework for Accurate Prediction of Protein Functions from Protein Sequences and Interactions;PROTEOMICS;2019-05-27

4. Compositionally Biased Dark Matter in the Protein Universe;PROTEOMICS;2018-10-29

5. Exploring the dark foldable proteome by considering hydrophobic amino acids topology;Scientific Reports;2017-01-30