GC-Content Normalization for RNA-Seq Data-Reference-Cited by-同舟云学术

GC-Content Normalization for RNA-Seq Data

Published:2011-12 Issue:1 Volume:12 Page:
ISSN:1471-2105
Container-title:BMC Bioinformatics
language:en
Short-container-title:BMC Bioinformatics

Author:

Risso Davide,Schwartz Katja,Sherlock Gavin,Dudoit Sandrine

Abstract

Abstract Background Transcriptome sequencing (RNA-Seq) has become the assay of choice for high-throughput studies of gene expression. However, as is the case with microarrays, major technology-related artifacts and biases affect the resulting expression measures. Normalization is therefore essential to ensure accurate inference of expression levels and subsequent analyses thereof. Results We focus on biases related to GC-content and demonstrate the existence of strong sample-specific GC-content effects on RNA-Seq read counts, which can substantially bias differential expression analysis. We propose three simple within-lane gene-level GC-content normalization approaches and assess their performance on two different RNA-Seq datasets, involving different species and experimental designs. Our methods are compared to state-of-the-art normalization procedures in terms of bias and mean squared error for expression fold-change estimation and in terms of Type I error and p-value distributions for tests of differential expression. The exploratory data analysis and normalization methods proposed in this article are implemented in the open-source Bioconductor R package EDASeq. Conclusions Our within-lane normalization procedures, followed by between-lane normalization, reduce GC-content bias and lead to more accurate estimates of expression fold-changes and tests of differential expression. Such results are crucial for the biological interpretation of RNA-Seq experiments, where downstream analyses can be sensitive to the supplied lists of genes.

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Link

https://link.springer.com/content/pdf/10.1186/1471-2105-12-480.pdf

Reference37 articles.

1. Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M: The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 2008, 320(5881):1344. 10.1126/science.1158441

2. Wang Z, Gerstein M, Snyder M: RNA-Seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics 2009, 10: 57–63. 10.1038/nrg2484

3. Bullard J, Purdom E, Hansen K, Dudoit S: Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics 2010, 11: 94. 10.1186/1471-2105-11-94

4. Marioni J, Mason C, Mane S, Stephens M, Gilad Y: RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Research 2008, 18(9):1509. 10.1101/gr.079558.108

5. Mortazavi A, Williams B, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature Methods 2008, 5(7):621–628. 10.1038/nmeth.1226

Cited by 680 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Enhanced expression of the myogenic factor Myocyte enhancer factor-2 in imaginal disc myoblasts activates a partial, but incomplete, muscle development program;Developmental Biology;2024-12

2. Normalization of gene counts affects principal components-based exploratory analysis of RNA-sequencing data;Biochimica et Biophysica Acta (BBA) - Gene Regulatory Mechanisms;2024-12

3. Characterization of circulating tumor cells in patients with metastatic bladder cancer utilizing functionalized microfluidics;Neoplasia;2024-11

4. Robust estimation of cancer and immune cell-type proportions from bulk tumor ATAC-Seq data;2024-09-10

5. Robust estimation of cancer and immune cell-type proportions from bulk tumor ATAC-Seq data;2024-09-10