Abstract
AbstractBackgroundQuality control (QC) of cells, a critical first step in single-cell RNA sequencing data analysis, has largely relied on arbitrarily fixed data-agnostic thresholds applied to QC metrics such as gene complexity and fraction of reads mapping to mitochondrial genes. The few existing data-driven approaches perform QC at the level of samples or studies without accounting for biological variation.ResultsWe first demonstrate that QC metrics vary with both tissue and cell types across technologies, study conditions, and species. We then propose data-driven QC (ddqc), an unsupervised adaptive QC framework to perform flexible and data-driven QC at the level of cell types while retaining critical biological insights and improved power for downstream analysis.ddqcapplies an adaptive threshold based on the median absolute deviation on four QC metrics (gene and UMI complexity, fraction of reads mapping to mitochondrial and ribosomal genes).ddqcretains over a third more cells when compared to conventional data-agnostic QC filters. Finally, we show thatddqcrecovers biologically meaningful trends in gradation of gene complexity among cell types that can help answer questions of biological interest such as which cell types express the least and most number of transcripts overall, and ribosomal transcripts specifically.Conclusionsddqc retains cell types such as metabolically active parenchymal cells and specialized cells such as neutrophils which are often lost by conventional QC. Taken together, our work proposes a revised paradigm to quality filtering best practices—iterative QC, providing a data-driven QC framework compatible with observed biological diversity.
Publisher
Springer Science and Business Media LLC
Reference91 articles.
1. Luecken MD, Theis FJ. Current best practices in single-cell RNA-seq analysis: a tutorial. Mol Syst Biol. 2019;15:e8746 EMBO.
2. Ilicic T, Kim JK, Kolodziejczyk AA, Bagger FO, McCarthy DJ, Marioni JC, et al. Classification of low quality cells from single-cell RNA-seq data. Genome Biol. 2016;17:29.
3. Bach K, Pensa S, Grzelak M, Hadfield J, Adams DJ, Marioni JC, et al. Differentiation dynamics of mammary epithelial cells revealed by single-cell RNA sequencing. Nat Commun. 2017;8:2128.
4. Young MD, Behjati S. SoupX removes ambient RNA contamination from droplet-based single-cell RNA sequencing data. Gigascience. 2020;9. https://doi.org/10.1093/gigascience/giaa151.
5. Fleming SJ, Marioni JC, Babadi M. CellBender remove-background: a deep generative model for unsupervised removal of background noise from scRNA-seq datasets. Cold Spring Harbor Laboratory. 2019:791699 [cited 2021 Mar 4]. Available from: https://www.biorxiv.org/content/10.1101/791699v1.abstract.