Systematic evaluation with practical guidelines for single-cell and spatially resolved transcriptomics data simulation under multiple scenarios
-
Published:2024-06-03
Issue:1
Volume:25
Page:
-
ISSN:1474-760X
-
Container-title:Genome Biology
-
language:en
-
Short-container-title:Genome Biol
Author:
Duo Hongrui, Li Yinghong, Lan Yang, Tao Jingxin, Yang Qingxia, Xiao Yingxue, Sun Jing, Li Lei, Nie Xiner, Zhang Xiaoxi, Liang Guizhao, Liu Mingwei, Hao Youjin, Li BoORCID
Abstract
Abstract
Background
Single-cell RNA sequencing (scRNA-seq) and spatially resolved transcriptomics (SRT) have led to groundbreaking advancements in life sciences. To develop bioinformatics tools for scRNA-seq and SRT data and perform unbiased benchmarks, data simulation has been widely adopted by providing explicit ground truth and generating customized datasets. However, the performance of simulation methods under multiple scenarios has not been comprehensively assessed, making it challenging to choose suitable methods without practical guidelines.
Results
We systematically evaluated 49 simulation methods developed for scRNA-seq and/or SRT data in terms of accuracy, functionality, scalability, and usability using 152 reference datasets derived from 24 platforms. SRTsim, scDesign3, ZINB-WaVE, and scDesign2 have the best accuracy performance across various platforms. Unexpectedly, some methods tailored to scRNA-seq data have potential compatibility for simulating SRT data. Lun, SPARSim, and scDesign3-tree outperform other methods under corresponding simulation scenarios. Phenopath, Lun, Simple, and MFA yield high scalability scores but they cannot generate realistic simulated data. Users should consider the trade-offs between method accuracy and scalability (or functionality) when making decisions. Additionally, execution errors are mainly caused by failed parameter estimations and appearance of missing or infinite values in calculations. We provide practical guidelines for method selection, a standard pipeline Simpipe (https://github.com/duohongrui/simpipe; https://doi.org/10.5281/zenodo.11178409), and an online tool Simsite (https://www.ciblab.net/software/simshiny/) for data simulation.
Conclusions
No method performs best on all criteria, thus a good-yet-not-the-best method is recommended if it solves problems effectively and reasonably. Our comprehensive work provides crucial insights for developers on modeling gene expression data and fosters the simulation process for users.
Funder
Science and Technology Research Program of Chongqing Municipal Education Commission National Natural Science Foundation of China China Postdoctoral Science Foundation Graduate Research Innovation Project of Chongqing Normal University Open Fund of Yunnan Key Laboratory of Plant Reproductive Adaptation and Evolutionary Ecology
Publisher
Springer Science and Business Media LLC
Reference84 articles.
1. Xue R, Zhang Q, Cao Q, Kong R, Xiang X, Liu H, Feng M, Wang F, Cheng J, Li Z, et al. Liver tumour immune microenvironment subtypes and neutrophil heterogeneity. Nature. 2022;612:141–7. 2. Rao A, Barkley D, Franca GS, Yanai I. Exploring tissue architecture using spatial transcriptomics. Nature. 2021;596:211–20. 3. Galeano Nino JL, Wu H, LaCourse KD, Kempchinsky AG, Baryiames A, Barber B, Futran N, Houlton J, Sather C, Sicinska E, et al. Effect of the intratumoral microbiota on spatial and cellular heterogeneity in cancer. Nature. 2022;611:810–7. 4. Garcia-Alonso L, Lorenzi V, Mazzeo CI, Alves-Lopes JP, Roberts K, Sancho-Serra C, Engelbert J, Mareckova M, Gruhn WH, Botting RA, et al. Single-cell roadmap of human gonadal development. Nature. 2022;607:540–7. 5. Kuppe C, Ramirez Flores RO, Li Z, Hayat S, Levinson RT, Liao X, Hannani MT, Tanevski J, Wunnemann F, Nagai JS, et al. Spatial multi-omic map of human myocardial infarction. Nature. 2022;608:766–77.
|
|