Whole Proteome Clustering of 2,307 Genomes Reveals Remarkable Conservation of Four Proteins Among Proteobacteria While Revealing Significant Annotation Issues-Reference-Cited by-同舟云学术

Whole Proteome Clustering of 2,307 Genomes Reveals Remarkable Conservation of Four Proteins Among Proteobacteria While Revealing Significant Annotation Issues

Published:2018-06-21 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Lockwood Svetlana^ORCID,Brayton Kelly A.^ORCID,Daily Jeff A.,Broschat Shira L.^ORCID

Abstract

AbstractTo explore the concept of a minimal gene set, we clustered 8.76 M protein sequences deduced from 2,307 completely sequenced Proteobacterial genomes. To our knowledge this is the first study of this scale. Clustering resulted in 707,311 clusters of which 224,442 ranged in size from 2 to 2,894 sequences. The resulting clusters allowed us to ask the question: Is a set of proteins conserved across all Proteobacteria? We chose four essential proteins, the chaperonin GroEL, DNA dependent RNA polymerase subunits beta and beta’ (RpoB/RpoB’), and DNA polymerase I (PolA), representing fundamental cellular functions, and examined their distribution in the clusters. We found these proteins to be remarkably conserved. Although thegroELgene was universally conserved in all the organisms in the study, the protein was not represented in all the deduced proteomes. The genes for RpoB and RpoB’ were missing from two genomes and merged in 88 genomes, and the sequences were sufficiently divergent that they formed separate clusters for 18 RpoB proteins (seven clusters) and 14 RpoB’ proteins (three clusters). For PolA, 52 organisms lacked an identifiable sequence, and seven sequences were sufficiently divergent that they formed five separate clusters. Interestingly, organisms lacking an identifiable PolA and those with divergent RpoB/RpoB’ were almost all endosymbionts. Furthermore, we present a range of examples of annotation issues that caused the deduced proteins to be incorrectly represented in the proteome. These annotation issues represent a significant obstacle for high throughput analyses.

Publisher

Cold Spring Harbor Laboratory

Reference41 articles.

1. Chaperonin chamber accelerates protein folding through passive action of preventing aggregation

2. A high-throughput screening of genes that encode proteins transported into the endoplasmic reticulum in mammalian cells

3. Berg, JM , Tymoczko, JL , Stryer, L . 2002. Transcription Is Catalyzed by RNA Polymerase. W. H. Freeman, New York.

4. A conserved 3′→5′ exonuclease active site in prokaryotic and eukaryotic DNA polymerases

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Whole Proteome Clustering of 2,307 Proteobacterial Genomes Reveals Conserved Proteins and Significant Annotation Issues;Frontiers in Microbiology;2019-02-28