ClusterDE: a post-clustering differential expression (DE) method robust to false-positive inflation caused by double dipping-Reference-Cited by-同舟云学术

ClusterDE: a post-clustering differential expression (DE) method robust to false-positive inflation caused by double dipping

Published:2023-07-25 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Song Dongyuan^ORCID,Li Kexin^ORCID,Ge Xinzhou^ORCID,Li Jingyi Jessica^ORCID

Abstract

AbstractIn typical single-cell RNA-seq (scRNA-seq) data analysis, a clustering algorithm is applied to find putative cell types as clusters, and then a statistical differential expression (DE) test is used to identify the differentially expressed (DE) genes between the cell clusters. However, this common procedure uses the same data twice, an issue known as “double dipping”: the same data is used to define both cell clusters and DE genes, leading to false-positive DE genes even when the cell clusters are spurious. To overcome this challenge, we propose ClusterDE, a post-clustering DE test for controlling the false discovery rate (FDR) of identified DE genes regardless of clustering quality. The core idea of ClusterDE is to generate real-data-based synthetic null data with only one cluster, as a counterfactual in contrast to the real data, for evaluating the whole procedure of clustering followed by a DE test. Using comprehensive simulation and real data analysis, we show that ClusterDE has not only solid FDR control but also the ability to find cell-type marker genes that are biologically meaningful. ClusterDE is fast, transparent, and adaptive to a wide range of clustering algorithms and DE tests. Besides scRNA-seq data, ClusterDE is generally applicable to post-clustering DE analysis, including single-cell multi-omics data analysis.

Publisher

Cold Spring Harbor Laboratory

Reference37 articles.

1. A practical guide to single-cell rna-sequencing for biomedical research and clinical applications;Genome medicine,2017

2. Tutorial: guidelines for the computational analysis of single-cell rna sequencing data;Nature protocols,2021

3. Integrated analysis of multimodal single-cell data

4. RNA virus interference via CRISPR/Cas13a system in plants

5. Valid post-clustering differential analysis for single-cell rna-seq;Cell systems,2019

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Profiling cell identity and tissue architecture with single-cell and spatial transcriptomics;Nature Reviews Molecular Cell Biology;2024-08-21

2. Single-cell omics: experimental workflow, data analyses and applications;Science China Life Sciences;2024-07-23

3. BacSC: A general workflow for bacterial single-cell RNA sequencing data analysis;2024-06-27

4. Systematic Evaluation of Cell Type Deconvolution Methods for Plasma Cell-free DNA;2024-03-29

5. SciGeneX: Enhancing transcriptional analysis through gene module detection in single-cell and spatial transcriptomics data;2024-03-20