A Bayesian framework for inter-cellular information sharing improves dscRNA-seq quantification

Author:

Srivastava AviORCID,Malik LaraibORCID,Sarkar HirakORCID,Patro RobORCID

Abstract

AbstractMotivationDroplet based single cell RNA-seq (dscRNA-seq) data is being generated at an unprecedented pace, and the accurate estimation of gene level abundances for each cell is a crucial first step in most dscRNA-seq analyses. When preprocessing the raw dscRNA-seq data to generate a count matrix, care must be taken to account for the potentially large number of multi-mapping locations per read. The sparsity of dscRNA-seq data, and the strong 3’ sampling bias, makes it difficult to disambiguate cases where there is no uniquely mapping read to any of the candidate target genes.ResultsWe introduce a Bayesian framework for information sharing across cells within a sample, or across multiple modalities of data using the same sample, to improve gene quantification estimates for dscRNA-seq data. We use an anchor-based approach to connect cells with similar gene expression patterns, and learn informative, empirical priors which we provide to alevin’s gene multi-mapping resolution algorithm. This improves the quantification estimates for genes with no uniquely mapping reads (i.e. when there is no unique intra-cellular information). We show our new model improves the per cell gene level estimates and provides a principled framework for information sharing across multiple modalities. We test our method on a combination of simulated and real datasets under various setups.AvailabilityThe information sharing model is included in alevin and is implemented in C++14. It is available as open-source software, under GPL v3, at https://github.com/COMBINE-lab/salmon as of version 1.1.0.Contactasrivastava@cs.stonybrook.edu, rob@cs.umd.edu

Publisher

Cold Spring Harbor Laboratory

Reference43 articles.

1. 10x Genomics (2017). 10x v2 human pbmc 4k data. https://support.10xgenomics.com/single-cell-gene-expression/datasets/2.1.0/pbmc4k.

2. 10x Genomics (2018). 10x v3 human pbmc 10k data. https://support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.0/pbmc_10k_v3.

3. 10x Genomics (2019). 10x mouse brain spatial data. https://support.10xgenomics.com/spatial-gene-expression/datasets/1.0.0/V1_Adult_Mouse_Brain.

4. Äijö, T. , Maniatis, S. , Vickovic, S. , Kang, K. , Cuevas, M. , Braine, C. , Phatnani, H. , Lundeberg, J. , and Bonneau, R. (2019). Splotch: Robust estimation of aligned spatial temporal gene expression data. bioRxiv, page 757096.

5. Amodio, M. , Van Dijk, D. , Srinivasan, K. , Chen, W. S. , Mohsen, H. , Moon, K. R. , Campbell, A. , Zhao, Y. , Wang, X. , Venkataswamy, M. , et al. (2019). Exploring single-cell data with deep multitasking neural networks. Nature methods, pages 1–7.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3