Affiliation:
1. Bioinformatics Program
2. Department of Microbiology & Immunology, University of Michigan Medical School, Ann Arbor, Michigan 48109
3. Center for Statistical Genetics and Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, Michigan 48109
Abstract
ABSTRACT
Various computational approaches have been proposed for operon prediction, but most algorithms rely on experimental or functional data that are only available for a small subset of sequenced genomes. In this study, we explored the possibility of using phylogenetic information to aid in operon prediction, and we constructed a Bayesian hidden Markov model that incorporates comparative genomic data with traditional predictors, such as intergenic distances. The prediction algorithm performs as well as the best previously reported method, with several significant advantages. It uses fewer data sources and so it is easier to implement, and the method is more broadly applicable than previous methods—it can be applied to essentially every gene in any sequenced bacterial genome. Furthermore, we show that near-optimal performance is easily reached with a generic set of comparative genomes and does not depend on a specific relationship between the subject genome and the comparative set. We applied the algorithm to the
Bacillus anthracis
genome and found that it successfully predicted all previously verified
B. anthracis
operons. To further test its performance, we chose a predicted operon (BA1489-92) containing several genes with little apparent functional relatedness and tested their cotranscriptional nature. Experimental evidence shows that these genes are cotranscribed, and the data have interesting implications for
B. anthracis
biology. Overall, our findings show that this algorithm is capable of highly sensitive and accurate operon prediction in a wide range of bacterial genomes and that these predictions can lead to the rapid discovery of new functional relationships among genes.
Publisher
American Society for Microbiology
Subject
Ecology,Applied Microbiology and Biotechnology,Food Science,Biotechnology
Reference36 articles.
1. Allen, J., M. Pertea, and S. L. Salzberg. 2004. Computational gene prediction using multiple sources of evidence. Genome Res.14:142-148.
2. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
3. Bockhorst, J., M. Craven, D. Page, J. Shavlik, and J. Glasner. 2003. A Bayesian network approach to operon prediction. Bioinformatics19:1227-1235.
4. Bockhorst, J., Y. Qiu, J. Glasner, M. Liu, F. Blattner, and M. Craven. 2003. Predicting bacterial transcription units using sequence and expression data. Bioinformatics19(Suppl. 1):i34-i43.
5. Chen, X., Z. Su, P. Dam, B. Palenik, Y. Xu, and T. Jiang. 2004. Operon prediction by comparative genomics: an application to the Synechococcus sp. WH8102 genome. Nucleic Acids Res.32:2147-2157.
Cited by
36 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献