Transformation and Preprocessing of Single-Cell RNA-Seq Data


Ahlmann-Eltze ConstantinORCID,Huber WolfgangORCID


AbstractThe count table, a numeric matrix of genes × cells, is the basic input data structure in the analysis of single-cell RNA-seq data. A common preprocessing step is to adjust the counts for variable sampling efficiency and to transform them so that the variance is similar across the dynamic range. These steps are intended to make subsequent application of generic statistical methods more palatable. Here, we describe three transformations (based on the delta method, model residuals, or inferred latent expression state) and compare their strengths and weaknesses. We find that although the residuals and latent expression state-based models have appealing theoretical properties, in benchmarks using simulated and real-world data the simple shifted logarithm in combination with principal component analysis performs surprisingly well.SoftwareAn R package implementing the delta method and residuals-based variance-stabilizing transformations is available on


Cold Spring Harbor Laboratory

Reference34 articles.

1. 10X Genomics (2018). 5k 1:1 mixture of fresh frozen human (HEK293T) and mouse (NIH3T3) cells (v3 chemistry).

2. Ahlmann-Eltze, C. and Huber, W. (2020). glmGamPoi: Fitting gamma-Poisson generalized linear models on single cell count data. Bioinformatics.

3. Orchestrating single-cell analysis with Bioconductor;Nature Methods,2020

4. An atlas of the aging lung mapped by single cell transcriptomics and deep tissue proteomics;Nature Communications,2019

5. Sensitive and powerful single-cell RNA sequencing using mcSCRB-seq;Nature Communications,2018

Cited by 17 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献







Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3