Abstract
AbstractLibrary size normalisation is necessary to enable comparisons between observations in transcriptomic datasets. Numerous methods have been developed to normalise these effects with sample and gene specific adjustments. However, in spatial transcriptomics data, normalisation is complicated by the fact that spatial region-specific library size confounds biology. The most popular approach of adapting methods developed for single-cell RNA-seq data has been shown to excessively remove biological signals associated with spatial domains and thus results in poorer downstream domain identification. To this end, we propose the first spatially-aware normalisation method, SpaNorm. SpaNorm concurrently models spatial library size effects and the underlying smooth biology, to tease apart these effects, and thereby remove library size effects without removing biology. This is achieved through optimal decomposition of spatially smooth variation into those related and unrelated to library size and the use of location-specific scaling factors. Using 27 tissue samples from 6 datasets spanning 4 spatial platforms, we show that SpaNorm outperforms current state of the art methods at retaining biological information in the form of spatial domains and spatially variable genes (SVGs) better than 4 commonly used single-cell normalisation approaches. SpaNorm is versatile and it can be used for both spot-based and subcellular spatial transcriptomics data. Notably, the benefit of using SpaNorm is more pronounced for the latter data such as those from Xenium, STOmics and CosMx platforms for which the proportion of genes exhibiting region-specific library size effect is higher. SpaNorm works equally well with segmented cell-level data and spot-based data where each spot contains multiple cells.
Publisher
Cold Spring Harbor Laboratory