Abstract
AbstractExperimental studies ofEscherichia coliK-12 MG1655 often implicate poorly annotated genes in cellular phenotypes. However, we lack a systematic understanding of these genes. How many are there? What informationisavailable for them? And what features do they share that could explain the gap in our understanding? Efforts to build predictive, whole-cell models ofE. coliinevitably face this knowledge gap. We approached these questions systematically by assembling annotations from the knowledge bases EcoCyc, EcoGene, UniProt, RefSeq, and RegulonDB. We identified the genes that lack direct experimental evidence of function (the “y-ome”) which include 1563 of 4653 unique genes (34%), of which 131 have absolutely no evidence of function. An additional 304 genes (6.6%) are pseudogenes or phantom genes. y-ome genes tend to have lower expression levels and are enriched in the termination region of theE. colichromosome. Where evidence is available for y-ome genes, it most often points to them being membrane proteins and transporters. We resolve the misconception that a gene inE. coliwhose primary name starts with “y” is unannotated, and we discuss the value of the y-ome for systematic improvement ofE. coliknowledge bases and its extension to other organisms.
Publisher
Cold Spring Harbor Laboratory
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献