Abstract
ABSTRACTRecent technology breakthroughs in spatially resolved transcriptomics (SRT) have enabled the comprehensive molecular characterization of cells whilst preserving their spatial and gene expression contexts. One of the fundamental questions in analyzing SRT data is the identification of spatially variable genes whose expressions display spatially correlated patterns. Existing approaches are built upon either the Gaussian process-based model, which relies onad hockernels, or the energy-based Ising model, which requires gene expression to be measured on a lattice grid. To overcome these potential limitations, we developed a generalized energybased framework to model gene expression measured from imaging-based SRT platforms, accommodating the irregular spatial distribution of measured cells. Our Bayesian model applies a zero-inflated negative binomial mixture model to dichotomize the raw count data, reducing noise. Additionally, we incorporate a geostatistical mark interaction model with a generalized energy function, where the interaction parameter is used to identify the spatial pattern. Auxiliary variable MCMC algorithms were employed to sample from the posterior distribution with an intractable normalizing constant. We demonstrated the strength of our method on both simulated and real data. Our simulation study showed that our method captured various spatial patterns with high accuracy; moreover, analysis of a seqFISH dataset and a STARmap dataset established that our proposed method is able to identify genes with novel and strong spatial patterns.
Publisher
Cold Spring Harbor Laboratory
Reference34 articles.
1. The use of the area under the ROC curve in the evaluation of machine learning algorithms
2. General methods for monitoring convergence of iterative simulations;Journal of Computational and Graphical Statistics,1998
3. Clifford, P. (1990). Markov random fields in statistics. Disorder in Physical Systems: A volume in honour of John M. Hammersley, 19–32
4. scMC learns biological variation through the alignment of multiple single-cell genomics datasets