Author:
Singh Vikas,Kirtipal Nikhil,Song Byong-Sop,Lee Sunjae
Abstract
AbstractThe normalization of RNA sequencing data is a primary step for downstream analysis. The most popular method used for the normalization is the trimmed mean of M values (TMM) and DESeq. The TMM tries to trim away extreme log fold changes of the data to normalize the raw read counts based on the remaining non-deferentially expressed genes. However, the major problem with the TMM is that the values of trimming factor M are heuristic. This paper tries to estimate the adaptive value of M in TMM based on Jaeckel’s Estimator, and each sample acts as a reference to find the scale factor of each sample. The presented approach is validated on SEQC, MAQC2, MAQC3, PICKRELL, and two simulated datasets with two groups and three groups conditions by varying the percentage of differential expression and the number of replicates. The performance of the present approach is compared, and it shows better in terms of area under the receiver operating characteristic curve (AUC) and differential expression. The implementation of the present approach is available on the GitHub platform:https://github.com/vikkyak/Normalization-of-Bulk-RNA-seq.
Publisher
Cold Spring Harbor Laboratory
Reference33 articles.
1. J Zyprych-Walczak , A Szabelska , L Handschuh , K Górczak , K Klamecka , M Figlerowicz , I Siatkowski , et al. The impact of normalization methods on RNA-Seq data analysis. BioMed research international, 2015, 2015.
2. Selecting betweensample RNA-Seq normalization methods from the perspective of their assumptions;Briefings in Bioinf,2018
3. Mapping and quantifying mammalian transcriptomes by RNA-Seq;Nature methods,2008
4. Transcript length bias in RNA-Seq data confounds systems biology;Biology direct,2009
5. GC-content normalization for RNA-Seq data;BMC Bioinf,2011