Abstract
AbstractAutomatic de novo identification of the main regulons of a bacterium from genome and transcriptome data remains a challenge. To address this task, we propose a statistical model of promoter DNA sequences that can use information on exact positions of the transcription start sites and condition-dependent expression profiles. Two main novelties are to allow overlaps between motif occurrences and to incorporate covariates summarising expression profiles (e.g. coordinates in projection spaces or hierarchical clustering trees). All parameters are estimated using a dedicated trans-dimensional Markov chain Monte Carlo algorithm that adjusts, simultaneously, for many motifs and many expression covariates: the width and palindromic properties of the corresponding position-weight matrices, the number of parameters to describe position with respect to the transcription start site, and the choice of relevant expression covariates. A data-set of transcription start sites and expression profiles available for the Listeria monocytogenes is analysed. The results validate the approach and provide a new global view of the transcription regulatory network of this important model food-borne pathogen. A previously unreported motif that may play an important role in the regulation of growth was found in promoter regions of ribosomal protein genes.
Publisher
Cold Spring Harbor Laboratory