Benchmarking atlas-level data integration in single-cell genomics-Reference-Cited by-同舟云学术

Benchmarking atlas-level data integration in single-cell genomics

Published:2020-05-23 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Luecken MD^ORCID,Büttner M^ORCID,Chaichoompu K^ORCID,Danese A,Interlandi M^ORCID,Mueller MF^ORCID,Strobl DC^ORCID,Zappia L^ORCID,Dugas M^ORCID,Colomé-Tatché M^ORCID,Theis FJ^ORCID

Abstract

AbstractCell atlases often include samples that span locations, labs, and conditions, leading to complex, nested batch effects in data. Thus, joint analysis of atlas datasets requires reliable data integration.Choosing a data integration method is a challenge due to the difficulty of defining integration success. Here, we benchmark 38 method and preprocessing combinations on 77 batches of gene expression, chromatin accessibility, and simulation data from 23 publications, altogether representing >1.2 million cells distributed in nine atlas-level integration tasks. Our integration tasks span several common sources of variation such as individuals, species, and experimental labs. We evaluate methods according to scalability, usability, and their ability to remove batch effects while retaining biological variation.Using 14 evaluation metrics, we find that highly variable gene selection improves the performance of data integration methods, whereas scaling pushes methods to prioritize batch removal over conservation of biological variation. Overall, BBKNN, Scanorama, and scVI perform well, particularly on complex integration tasks; Seurat v3 performs well on simpler tasks with distinct biological signals; and methods that prioritize batch removal perform best for ATAC-seq data integration. Our freely available reproducible python module can be used to identify optimal data integration methods for new data, benchmark new methods, and improve method development.

Publisher

Cold Spring Harbor Laboratory

Reference67 articles.

1. A Single Cell Transcriptomic Atlas Characterizes Aging Tissues in the Mouse

2. Highly Multiplexed Single-Cell RNA-seq for Defining Cell Population and Transcriptional Spaces

3. Benchmarking Single-Cell RNA Sequencing Protocols for Cell Atlas Projects

4. Regev, A. et al. The Human Cell Atlas White Paper. arXiv [q-bio.TO] (2018).

5. Single-cell RNA-seq analysis software providers scramble to offer solutions;Nature Biotechnology,2020

Cited by 92 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. multiDGD: A versatile deep generative model for multi-omics data;2023-08-23

2. Consensus prediction of cell type labels with popV;2023-08-21

3. Is your data alignable? Principled and interpretable alignability testing and integration of single-cell data;2023-08-06

4. An empirical Bayes method for differential expression analysis of single cells with deep generative models;Proceedings of the National Academy of Sciences;2023-05-16

5. scDisInFact: disentangled learning for integration and prediction of multi-batch multi-condition single-cell RNA-sequencing data;2023-05-02