Abstract
AbstractIn single-cell data workflows and modeling, distance metrics are commonly used in loss functions, model evaluation, and subpopulation analysis. However, these metrics behave differently depending on the source of variation, conditions and subpopulations in single-cell expression profiles due to data sparsity and high dimensionality. Thus, the metrics used for downstream tasks in this domain should be carefully selected. We establish a set of benchmarks with three evaluation measures, capturing desirable facets of absolute and relative distance behavior. Based on seven datasets using perturbation as ground truth, we evaluated 16 distance metrics applied to scRNA-seq data and demonstrated their application to three use cases. We find that linear metrics such as mean squared error (MSE) performed best across our three evaluation criteria. Therefore, we recommend the use of MSE for comparing single-cell RNA-seq populations and evaluating gene expression prediction models.
Publisher
Cold Spring Harbor Laboratory
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献