Annotation of gene promoters by integrative data-mining of ChIP-seq Pol-II enrichment data-Reference-Cited by-同舟云学术

Annotation of gene promoters by integrative data-mining of ChIP-seq Pol-II enrichment data

Published:2010-01 Issue:S1 Volume:11 Page:
ISSN:1471-2105
Container-title:BMC Bioinformatics
language:en
Short-container-title:BMC Bioinformatics

Author:

Gupta Ravi,Wikramasinghe Priyankara,Bhattacharyya Anirban,Perez Francisco A,Pal Sharmistha,Davuluri Ramana V

Abstract

Abstract Background Use of alternative gene promoters that drive widespread cell-type, tissue-type or developmental gene regulation in mammalian genomes is a common phenomenon. Chromatin immunoprecipitation methods coupled with DNA microarray (ChIP-chip) or massive parallel sequencing (ChIP-seq) are enabling genome-wide identification of active promoters in different cellular conditions using antibodies against Pol-II. However, these methods produce enrichment not only near the gene promoters but also inside the genes and other genomic regions due to the non-specificity of the antibodies used in ChIP. Further, the use of these methods is limited by their high cost and strong dependence on cellular type and context. Methods We trained and tested different state-of-art ensemble and meta classification methods for identification of Pol-II enriched promoter and Pol-II enriched non-promoter sequences, each of length 500 bp. The classification models were trained and tested on a bench-mark dataset, using a set of 39 different feature variables that are based on chromatin modification signatures and various DNA sequence features. The best performing model was applied on seven published ChIP-seq Pol-II datasets to provide genome wide annotation of mouse gene promoters. Results We present a novel algorithm based on supervised learning methods to discriminate promoter associated Pol-II enrichment from enrichment elsewhere in the genome in ChIP-chip/seq profiles. We accumulated a dataset of 11,773 promoter and 46,167 non-promoter sequences, each of length 500 bp, generated from RNA Pol-II ChIP-seq data of five tissues (Brain, Kidney, Liver, Lung and Spleen). We evaluated the classification models in building the best predictor and found that Bagging and Random Forest based approaches give the best accuracy. We implemented the algorithm on seven different published ChIP-seq datasets to provide a comprehensive set of promoter annotations for both protein-coding and non-coding genes in the mouse genome. The resulting annotations contain 13,413 (4,747) protein-coding (non-coding) genes with single promoters and 9,929 (1,858) protein-coding (non-coding) genes with two or more alternative promoters, and a significant number of unassigned novel promoters. Conclusion Our new algorithm can successfully predict the promoters from the genome wide profile of Pol-II bound regions. In addition, our algorithm performs significantly better than existing promoter prediction methods and can be applied for genome-wide predictions of Pol-II promoters.

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Link

https://link.springer.com/content/pdf/10.1186/1471-2105-11-S1-S65.pdf

Reference50 articles.

1. Sun H, Palaniswamy SK, Pohar TT, Jin VX, Huang TH, Davuluri RV: MPromDb: an integrated resource for annotation and visualization of mammalian gene promoters and ChIP-chip experimental data. Nucleic Acids Res 2006, (34 Database):D98–103. 10.1093/nar/gkj096

2. Baek D, Davis C, Ewing B, Gordon D, Green P: Characterization and predictive discovery of evolutionarily conserved mammalian alternative promoters. Genome Res 2007, 17(2):145–155. 10.1101/gr.5872707

3. Cooper SJ, Trinklein ND, Anton ED, Nguyen L, Myers RM: Comprehensive analysis of transcriptional promoter structure and function in 1% of the human genome. Genome Res 2006, 16(1):1–10. 10.1101/gr.4222606

4. Kawaji H, Severin J, Lizio M, Waterhouse A, Katayama S, Irvine KM, Hume DA, Forrest AR, Suzuki H, Carninci P, et al.: The FANTOM web resource: from mammalian transcriptional landscape to its dynamic regulation. Genome Biol 2009, 10(4):R40. 10.1186/gb-2009-10-4-r40

5. Davuluri RV, Suzuki Y, Sugano S, Plass C, Huang TH: The functional consequences of alternative promoter use in mammalian genomes. Trends Genet 2008, 24(4):167–177. 10.1016/j.tig.2008.01.008

Cited by 25 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. The transcription factor Bcl11b promotes both canonical and adaptive NK cell differentiation;Science Immunology;2021-03-04

2. Tumor-Based Genetic Testing and Familial Cancer Risk;Cold Spring Harbor Perspectives in Medicine;2019-09-30

3. Platform-Independent Gene-Expression Based Classification-System for Molecular Sub-typing of Cancer;Health Informatics;2019-09-18

4. Learning to predict single-wall carbon nanotube-recognition DNA sequences;npj Computational Materials;2019-01-10

5. Novel promoters and coding first exons in DLG2 linked to developmental disorders and intellectual disability;Genome Medicine;2017-07-19