Abstract
Abstract
Background
Spatial transcriptomics are a set of new technologies that profile gene expression on tissues with spatial localization information. With technological advances, recent spatial transcriptomics data are often in the form of sparse counts with an excessive amount of zero values.
Results
We perform a comprehensive analysis on 20 spatial transcriptomics datasets collected from 11 distinct technologies to characterize the distributional properties of the expression count data and understand the statistical nature of the zero values. Across datasets, we show that a substantial fraction of genes displays overdispersion and/or zero inflation that cannot be accounted for by a Poisson model, with genes displaying overdispersion substantially overlapped with genes displaying zero inflation. In addition, we find that either the Poisson or the negative binomial model is sufficient for modeling the majority of genes across most spatial transcriptomics technologies. We further show major sources of overdispersion and zero inflation in spatial transcriptomics including gene expression heterogeneity across tissue locations and spatial distribution of cell types. In particular, when we focus on a relatively homogeneous set of tissue locations or control for cell type compositions, the number of detected overdispersed and/or zero-inflated genes is substantially reduced, and a simple Poisson model is often sufficient to fit the gene expression data there.
Conclusions
Our study provides the first comprehensive evidence that excessive zeros in spatial transcriptomics are not due to zero inflation, supporting the use of count models without a zero inflation component for modeling spatial transcriptomics.
Funder
National Institutes of Health
Publisher
Springer Science and Business Media LLC
Cited by
32 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献