Integrating single-cell RNA-seq datasets with substantial batch effects

Author:

Hrovatin KarinORCID,Moinfar Amir Ali,Lapuerta Alejandro Tejada,Zappia LukeORCID,Lengerich BenORCID,Kellis ManolisORCID,Theis Fabian J.ORCID

Abstract

AbstractComputational methods for integrating scRNA-seq datasets often struggle to harmonize datasets with substantial differences driven by technical or biological variation, such as between different species, organoids and primary tissue, or different scRNA-seq protocols, including single-cell and single-nuclei. Given that many widely adopted and scalable methods are based on conditional variational autoencoders (cVAE), we hypothesize that machine learning interventions to standard cVAEs can help improve batch effect removal while potentially preserving biological variation more effectively. To address this, we assess four strategies applied to commonly used cVAE models: the previously proposed Kullback–Leibler divergence (KL) regularization tuning and adversarial learning, as well as cycle-consistency loss (previously applied to multi-omic integration) and the multimodal variational mixture of posteriors prior (VampPrior) that has not yet been applied to integration. We evaluated performance in three data settings, namely cross-species, organoid-tissue, and cell-nuclei integration. Cycle-consistency and VampPrior improved batch correction while retaining high biological preservation, with their combination further increasing performance. While adversarial learning led to the strongest batch correction, its preservation of within-cell type variation did not match that of VampPrior or cycle-consistency models, and it was also prone to mixing unrelated cell types with different proportions across batches. KL regularization strength tuning had the least favorable performance, as it jointly removed biological and batch variation by reducing the number of effectively used embedding dimensions. Based on our findings, we recommend the adoption of the VampPrior in combination with the cycle-consistency loss for integrating datasets with substantial batch effects.

Publisher

Cold Spring Harbor Laboratory

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3