Author:
Wu Zhenfeng,Liu Weixiang,Jin Xiufeng,Yu Deshui,Wang Hua,Glusman Gustavo,Robinson Max,Liu Lin,Ruan Jishou,Shan Gao
Abstract
AbstractData normalization is a crucial step in the gene expression analysis as it ensures the validity of its downstream analyses. Although many metrics have been designed to evaluate the current normalization methods, the different metrics yield inconsistent results. In this study, we designed a new metric named Area Under normalized CV threshold Curve (AUCVC) and applied it with another metric mSCC to evaluate 14 commonly used normalization methods, achieving consistency in our evaluation results using both bulk RNA-seq and scRNA-seq data from the same library construction protocol. This consistency has validated the underlying theory that a sucessiful normalization method simultaneously maximizes the number of uniform genes and minimizes the correlation between the expression profiles of gene pairs. This consistency can also be used to analyze the quality of gene expression data. The gene expression data, normalization methods and evaluation metrics used in this study have been included in an R package named NormExpression. NormExpression provides a framework and a fast and simple way for researchers to evaluate methods (particularly some data-driven methods or their own methods) and then select a best one for data normalization in the gene expression analysis.
Publisher
Cold Spring Harbor Laboratory
Reference14 articles.
1. Revisiting Global Gene Expression Analysis
2. Gao, S. , Ou, J. & Xiao, K. R language and Bioconductor in bioinformatics applications(Chinese Edition). (Tianjin Science and Technology Translation Publishing Ltd, 2014).
3. Data Analysis in Single-Cell Transcriptome Sequencing;Methods Mol Biol,2018
4. Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments
5. Synthetic spike-in standards for RNA-seq experiments
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献