Large-Scale Quality Analysis of Published ChIP-seq Data-Reference-Cited by-同舟云学术

Large-Scale Quality Analysis of Published ChIP-seq Data

Published:2014-02-01 Issue:2 Volume:4 Page:209-223
ISSN:2160-1836
Container-title:G3 Genes|Genomes|Genetics
language:en
Short-container-title:

Author:

Marinov Georgi K¹,Kundaje Anshul²³,Park Peter J⁴⁵⁶,Wold Barbara J¹

Affiliation:

1. Division of Biology, California Institute of Technology, Pasadena, California 91125

2. Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139

3. The Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, Massachusetts 02142

4. Center for Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115

5. Informatics Program, Children’s Hospital Boston, Boston, Massachusetts 02115

6. Division of Genetics, Brigham and Women’s Hospital, Boston, Massachusetts 02115

Abstract

Abstract ChIP-seq has become the primary method for identifying in vivo protein–DNA interactions on a genome-wide scale, with nearly 800 publications involving the technique appearing in PubMed as of December 2012. Individually and in aggregate, these data are an important and information-rich resource. However, uncertainties about data quality confound their use by the wider research community. Recently, the Encyclopedia of DNA Elements (ENCODE) project developed and applied metrics to objectively measure ChIP-seq data quality. The ENCODE quality analysis was useful for flagging datasets for closer inspection, eliminating or replacing poor data, and for driving changes in experimental pipelines. There had been no similarly systematic quality analysis of the large and disparate body of published ChIP-seq profiles. Here, we report a uniform analysis of vertebrate transcription factor ChIP-seq datasets in the Gene Expression Omnibus (GEO) repository as of April 1, 2012. The majority (55%) of datasets scored as being highly successful, but a substantial minority (20%) were of apparently poor quality, and another ∼25% were of intermediate quality. We discuss how different uses of ChIP-seq data are affected by specific aspects of data quality, and we highlight exceptional instances for which the metric values should not be taken at face value. Unexpectedly, we discovered that a significant subset of control datasets (i.e., no immunoprecipitation and mock immunoprecipitation samples) display an enrichment structure similar to successful ChIP-seq data. This can, in turn, affect peak calling and data interpretation. Published datasets identified here as high-quality comprise a large group that users can draw on for large-scale integrated analysis. In the future, ChIP-seq quality assessment similar to that used here could guide experimentalists at early stages in a study, provide useful input in the publication process, and be used to stratify ChIP-seq data for different community-wide uses.

Publisher

Oxford University Press (OUP)

Subject

Genetics(clinical),Genetics,Molecular Biology

Link

http://academic.oup.com/g3journal/article-pdf/4/2/209/37211712/g3journal0209.pdf

Reference203 articles.

1. Genome-wide mapping of Sox6 binding sites in skeletal muscle reveals both direct and indirect regulation of muscle terminal differentiation by Sox6.;An;BMC Dev. Biol.,2011

2. Wdr5 mediates self-renewal and reprogramming via the embryonic stem cell core transcriptional network.;Ang;Cell,2011

3. Mapping accessible chromatin regions using Sono-Seq.;Auerbach;Proc. Natl. Acad. Sci. USA,2009

4. Conserved molecular interactions within the HBO1 acetyltransferase complexes regulate cell proliferation.;Avvakumov;Mol. Cell. Biol.,2012

5. Bcl-6 and NF-κB cistromes mediate opposing regulation of the innate immune response.;Barish;Genes Dev.,2010

Cited by 119 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. aChIP is an efficient and sensitive ChIP-seq technique for economically important plant organs;Nature Plants;2024-08-23

2. Data quality assurance practices in research data repositories—A systematic literature review;Journal of the Association for Information Science and Technology;2024-08-07

3. Comprehensive multimodal and multiomic profiling reveals epigenetic and transcriptional reprogramming in lung tumors;2024-06-08

4. An updated compendium and reevaluation of the evidence for nuclear transcription factor occupancy over the mitochondrial genome;2024-06-06

5. Antipsychotic-induced epigenomic reorganization in frontal cortex of individuals with schizophrenia;eLife;2024-04-22