CellMixS: quantifying and visualizing batch effects in single cell RNA-seq data

Author:

Lütge AlmutORCID,Zyprych-Walczak JoannaORCID,Kunzmann Urszula Brykczynska,Crowell HelenaLORCID,Calini Daniela,Malhotra DheerajORCID,Soneson CharlotteORCID,Robinson Mark DORCID

Abstract

AbstractA key challenge in single cell RNA-sequencing (scRNA-seq) data analysis are dataset- and batch-specific differences that can obscure the biological signal of interest. While there are various tools and methods to perform data integration and correct for batch effects, their performance can vary between datasets and according to the nature of the bias. Therefore, it is important to understand how batch effects manifest in order to adjust for them in a reliable way. Here, we systematically explore batch effects in a variety of scRNA-seq datasets according to magnitude, cell type specificity and complexity.We developed a cell-specific mixing score (cms) that quantifies how well cells from multiple batches are mixed. By considering distance distributions (in a lower dimensional space), the score is able to detect local batch bias and differentiate between unbalanced batches (i.e., when one cell type is more abundant in a batch) and systematic differences between cells of the same cell type. We implemented cms and related metrics to detect batch effects or measure structure preservation in the CellMixS R/Bioconductor package.We systematically compare different metrics that have been proposed to quantify batch effects or bias in scRNA-seq data using real datasets with known batch effects and synthetic data that mimic various real data scenarios. While these metrics target the same question and are used interchangeably, we find differences in inter- and intra-dataset scalability, sensitivity and in a metric’s ability to handle batch effects with differentially abundant cell types. We find that cell-specific metrics outperform cell type-specific and global metrics and recommend them for both method benchmarks and batch exploration.

Publisher

Cold Spring Harbor Laboratory

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Single-Cell Analysis for Whole-Organism Datasets;Annual Review of Biomedical Data Science;2021-07-20

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3