Identifying cell states in single-cell RNA-seq data at statistically maximal resolution-Reference-Cited by-同舟云学术

Identifying cell states in single-cell RNA-seq data at statistically maximal resolution

Published:2023-11-03 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Grobecker Pascal,van Nimwegen Erik^ORCID

Abstract

AbstractSingle-cell RNA sequencing (scRNA-seq) has become a popular experimental method to study variation of gene expression within a population of cells. However, obtaining an accurate picture of the diversity of distinct gene expression states that are present in a given dataset is highly challenging because the sparsity of the scRNA-seq data and its inhomogeneous measurement noise properties. Although a vast number of different methods is applied in the literature for clustering cells into subsets with ‘similar’ expression profiles, these methods generally lack rigorously specified objectives, involve multiple complex layers of normalization, filtering, feature selection, dimensionalityreduction, employad hocmeasures of distance or similarity between cells, often ignore the known measurement noise properties of scRNA-seq measurements, and include a large number of tunable parameters. Consequently, it is virtually impossible to assign concrete biophysical meaning to the clusterings that result from these methods.Here we address the following problem: Given raw unique molecule identifier (UMI) counts of an scRNA-seq dataset, partition the cells into subsets such that the gene expression states of the cells in each subset are statistically indistinguishable, and each subset corresponds to a distinct gene expression state. That is, we aim to partition cells so as to maximally reduce the complexity of the dataset without removing any of its meaningful structure. We show that, given the known measurement noise structure of scRNA-seq data, this problem is mathematically well-defined and derive its unique solution from first principles. We have implemented this solution in a tool called Cellstateswhich operates directly on the raw data and automatically determines the optimal partition and cluster number, with zero tunable parameters.We show that, on synthetic datasets, Cellstatesalmost perfectly recovers optimal partitions. On real data, Cellstatesrobustly identifies subtle substructure within groups of cells that are traditionally annotated as a common cell type. Moreover, we show that the diversity of gene expression states that Cellstatesidentifies systematically depends on the tissue of origin and not on technical features of the experiments such as the total number of cells and total UMI count per cell. In addition to the Cellstatestool we also provide a small toolbox of software to place the identified cellstates into a hierarchical tree of higher-order clusters, to identify the most important marker genes at each branch of this hierarchy, and to visualize these results.

Publisher

Cold Spring Harbor Laboratory

Reference38 articles.

1. Genetic evidence that Nkx2.2 acts primarily downstream of Neurog3 in pancreatic endocrine lineage development

2. FUNDAMENTALS OF PLANARIAN REGENERATION

3. Schaum N , Karkanias J , Neff NF , May AP , Quake SR , et al. (2018) Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature 562.

4. Plass M , Solana J , Alexander Wolf F , Ayoub S , Misios A , et al. (2018) Cell type atlas and lineage tree of a whole complex animal by single-cell transcriptomics. Science 360.

5. Fincher CT , Wurtzel O , de Hoog T , Kravarik KM , Reddien PW (2018) Cell type transcriptome atlas for the planarian Schmidtea mediterranea. Science 360.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. AI writes summaries of preprints in bioRxiv trial;Nature;2023-11-14