Genome-wide discovery of missing genes in biological pathways of prokaryotes
-
Published:2011-02-15
Issue:S1
Volume:12
Page:
-
ISSN:1471-2105
-
Container-title:BMC Bioinformatics
-
language:en
-
Short-container-title:BMC Bioinformatics
Author:
Chen Yong,Mao Fenglou,Li Guojun,Xu Ying
Abstract
Abstract
Background
Reconstruction of biological pathways is typically done through mapping well-characterized pathways of model organisms to a target genome, through orthologous gene mapping. A limitation of such pathway-mapping approaches is that the mapped pathway models are constrained by the composition of the template pathways, e.g., some genes in a target pathway may not have corresponding genes in the template pathways, the so-called “missing gene” problem.
Methods
We present a novel pathway-expansion method for identifying additional genes that are possibly involved in a target pathway after pathway mapping, to fill holes caused by missing genes as well as to expand the mapped pathway model. The basic idea of the algorithm is to identify genes in the target genome whose homologous genes share common operons with homologs of any mapped pathway genes in some reference genome, and to add such genes to the target pathway if their functions are consistent with the cellular function of the target pathway.
Results
We have implemented this idea using a graph-theoretic approach and demonstrated the effectiveness of the algorithm on known pathways of E. coli in the KEGG database. On all KEGG pathways containing at least 5 genes, our method achieves an average of 60% positive predictive value (PPV) and the performance is increased with more seed genes added. Analysis shows that our method is highly robust.
Conclusions
An effective method is presented to find missing genes in biological pathways of prokaryotes, which achieves high prediction reliability on E. coli at a genome level. Numerous missing genes are found to be related to knwon E. coli pathways, which can be further validated through biological experiments. Overall this method is robust and can be used for functional inference.
Publisher
Springer Science and Business Media LLC
Subject
Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology
Reference28 articles.
1. Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, Eilbeck K, Lewis S, Marshall B, Mungall C, et al.: The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res 2004, 32(Databaseissue):D258–261. 2. Wierling C, Herwig R, Lehrach H: Resources, standards and tools for systems biology. Brief Funct Genomic Proteomic 2007, 6(3):240–251. 10.1093/bfgp/elm027 3. Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita KF, Itoh M, Kawashima S, Katayama T, Araki M, Hirakawa M: From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res 2006, 34(Databaseissue):D354–357. 10.1093/nar/gkj102 4. Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, et al.: The COG database: an updated version includes eukaryotes. BMC Bioinformatics 2003, 4: 41. 10.1186/1471-2105-4-41 5. Keseler IM, Bonavides-Martinez C, Collado-Vides J, Gama-Castro S, Gunsalus RP, Johnson DA, Krummenacker M, Nolan LM, Paley S, Paulsen IT, et al.: EcoCyc: a comprehensive view of Escherichia coli biology. Nucleic Acids Res 2009, 37(Databaseissue):D464–470. 10.1093/nar/gkn751
Cited by
11 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|