Misuse of RPKM or TPM normalization when comparing across samples and sequencing protocols-Reference-Cited by-同舟云学术

Misuse of RPKM or TPM normalization when comparing across samples and sequencing protocols

Published:2020-04-13 Issue:8 Volume:26 Page:903-909
ISSN:1355-8382
Container-title:RNA
language:en
Short-container-title:RNA

Author:

Zhao Shanrong,Ye Zhan,Stanton Robert

Abstract

In recent years, RNA-sequencing (RNA-seq) has emerged as a powerful technology for transcriptome profiling. For a given gene, the number of mapped reads is not only dependent on its expression level and gene length, but also the sequencing depth. To normalize these dependencies, RPKM (reads per kilobase of transcript per million reads mapped) and TPM (transcripts per million) are used to measure gene or transcript expression levels. A common misconception is that RPKM and TPM values are already normalized, and thus should be comparable across samples or RNA-seq projects. However, RPKM and TPM represent the relative abundance of a transcript among a population of sequenced transcripts, and therefore depend on the composition of the RNA population in a sample. Quite often, it is reasonable to assume that total RNA concentration and distributions are very close across compared samples. Nevertheless, the sequenced RNA repertoires may differ significantly under different experimental conditions and/or across sequencing protocols; thus, the proportion of gene expression is not directly comparable in such cases. In this review, we illustrate typical scenarios in which RPKM and TPM are misused, unintentionally, and hope to raise scientists’ awareness of this issue when comparing them across samples or different sequencing protocols.

Publisher

Cold Spring Harbor Laboratory

Subject

Molecular Biology

Reference37 articles.

1. Normalization of RNA-Sequencing Data from Samples with Varying mRNA Levels

2. Differential expression analysis for sequence count data

3. Near-optimal probabilistic RNA-seq quantification

4. The Genotype-Tissue Expression (GTEx) Project

5. A survey of best practices for RNA-seq data analysis

Cited by 199 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Exploring ecological effects of arsenic and cadmium combined exposure on cropland soil: from multilevel organisms to soil functioning by multi-omics coupled with high-throughput quantitative PCR;Journal of Hazardous Materials;2024-03

2. Biomineralization in a cold environment: Insights from shield compositions and transcriptomics of polar sternaspids (Sternaspidae, Polychaeta);Comparative Biochemistry and Physiology Part D: Genomics and Proteomics;2024-03

3. Differentially expressed genes of RNA-seq data are suggested on the intersections of normalization techniques;Biochemistry and Biophysics Reports;2024-03

4. Proteotransciptomics of the Most Popular Host Sea Anemone Entacmaea quadricolor Reveals Not All Toxin Genes Expressed by Tentacles Are Recruited into Its Venom Arsenal;Toxins;2024-02-05

5. Integration of bulk RNA sequencing to reveal protein arginine methylation regulators have a good prognostic value in immunotherapy to treat lung adenocarcinoma;Heliyon;2024-02