A Bayesian framework for inter-cellular information sharing improves dscRNA-seq quantification
Author:
Srivastava AviORCID, Malik LaraibORCID, Sarkar HirakORCID, Patro RobORCID
Abstract
AbstractMotivationDroplet based single cell RNA-seq (dscRNA-seq) data is being generated at an unprecedented pace, and the accurate estimation of gene level abundances for each cell is a crucial first step in most dscRNA-seq analyses. When preprocessing the raw dscRNA-seq data to generate a count matrix, care must be taken to account for the potentially large number of multi-mapping locations per read. The sparsity of dscRNA-seq data, and the strong 3’ sampling bias, makes it difficult to disambiguate cases where there is no uniquely mapping read to any of the candidate target genes.ResultsWe introduce a Bayesian framework for information sharing across cells within a sample, or across multiple modalities of data using the same sample, to improve gene quantification estimates for dscRNA-seq data. We use an anchor-based approach to connect cells with similar gene expression patterns, and learn informative, empirical priors which we provide to alevin’s gene multi-mapping resolution algorithm. This improves the quantification estimates for genes with no uniquely mapping reads (i.e. when there is no unique intra-cellular information). We show our new model improves the per cell gene level estimates and provides a principled framework for information sharing across multiple modalities. We test our method on a combination of simulated and real datasets under various setups.AvailabilityThe information sharing model is included in alevin and is implemented in C++14. It is available as open-source software, under GPL v3, at https://github.com/COMBINE-lab/salmon as of version 1.1.0.Contactasrivastava@cs.stonybrook.edu, rob@cs.umd.edu
Publisher
Cold Spring Harbor Laboratory
Reference43 articles.
1. 10x Genomics (2017). 10x v2 human pbmc 4k data. https://support.10xgenomics.com/single-cell-gene-expression/datasets/2.1.0/pbmc4k. 2. 10x Genomics (2018). 10x v3 human pbmc 10k data. https://support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.0/pbmc_10k_v3. 3. 10x Genomics (2019). 10x mouse brain spatial data. https://support.10xgenomics.com/spatial-gene-expression/datasets/1.0.0/V1_Adult_Mouse_Brain. 4. Äijö, T. , Maniatis, S. , Vickovic, S. , Kang, K. , Cuevas, M. , Braine, C. , Phatnani, H. , Lundeberg, J. , and Bonneau, R. (2019). Splotch: Robust estimation of aligned spatial temporal gene expression data. bioRxiv, page 757096. 5. Amodio, M. , Van Dijk, D. , Srinivasan, K. , Chen, W. S. , Mohsen, H. , Moon, K. R. , Campbell, A. , Zhao, Y. , Wang, X. , Venkataswamy, M. , et al. (2019). Exploring single-cell data with deep multitasking neural networks. Nature methods, pages 1–7.
|
|