Abstract
AbstractSingle-Cell RNA sequencing (scRNA-seq) technology measures the expression of thousands of genes at the cellular level. Analyzing single cell transcriptome allows the identification of heterogeneous cell groups, cellular-level regulations, and the trajectory of cell development. An important aspect in the analyses of scRNA-seq data is the clustering of cells, which is hampered by issues such as high dimensionality, cell type imbalance, redundancy, and dropout. Given cells of each type are functionally consistent, incorporating biological relations between genes may improve the clustering results. Here, we develop a deep embedded clustering method, G3DC, that incorporates a graph loss based on existing gene network, together with a reconstruction loss to achieve both discriminative and informative embedding. The involvement of the gene network strengthens clustering performance, while helping the selection of functionally coherent genes that contribute to the clustering results. In addition, this method is well adapted to the sparse and zero-inflated scRNA-seq data with theℓ2,1-norm involved. Extensive experiments have shown that G3DC offers high clustering accuracy with regard to agreement with true cell types, outperforming other leading single-cell clustering methods. In addition, G3DC selects biologically relevant genes that contribute to the clustering, providing insight into biological functionality that differentiate cell groups.
Publisher
Cold Spring Harbor Laboratory
Reference47 articles.
1. Hartigan, J.A. , Wong, M.A. : Algorithm as 136: A k-means clustering algorithm. Journal of the royal statistical society. series c (applied statistics) 2
2. Weakly supervised learning of biomedical information extraction from curated data
3. Jolliffe, I.T. : Principal Component Analysis for Special Types of Data. Springer, ??? (2002)
4. Fateid infers cell fate bias in multipotent progenitors from single-cell rna-seq data;Nature methods,2018
5. mbkmeans: Fast clustering for single cell data using mini-batch k-means;PLoS computational biology,2021