Identification of Interpretable Clusters and Associated Signatures in Breast Cancer Single-Cell Data: A Topic Modeling Approach-Reference-Cited by-同舟云学术

Identification of Interpretable Clusters and Associated Signatures in Breast Cancer Single-Cell Data: A Topic Modeling Approach

Published:2024-03-29 Issue:7 Volume:16 Page:1350
ISSN:2072-6694
Container-title:Cancers
language:en
Short-container-title:Cancers

Author:

Malagoli Gabriele¹²^ORCID,Valle Filippo²^ORCID,Barillot Emmanuel¹^ORCID,Caselle Michele²^ORCID,Martignetti Loredana¹^ORCID

Affiliation:

1. Institut Curie, Inserm U900, Mines ParisTech, PSL Research University, 75248 Paris, France

2. Physics Department, University of Turin and INFN, 10125 Turin, Italy

Abstract

Topic modeling is a popular technique in machine learning and natural language processing, where a corpus of text documents is classified into themes or topics using word frequency analysis. This approach has proven successful in various biological data analysis applications, such as predicting cancer subtypes with high accuracy and identifying genes, enhancers, and stable cell types simultaneously from sparse single-cell epigenomics data. The advantage of using a topic model is that it not only serves as a clustering algorithm, but it can also explain clustering results by providing word probability distributions over topics. Our study proposes a novel topic modeling approach for clustering single cells and detecting topics (gene signatures) in single-cell datasets that measure multiple omics simultaneously. We applied this approach to examine the transcriptional heterogeneity of luminal and triple-negative breast cancer cells using patient-derived xenograft models with acquired resistance to chemotherapy and targeted therapy. Through this approach, we identified protein-coding genes and long non-coding RNAs (lncRNAs) that group thousands of cells into biologically similar clusters, accurately distinguishing drug-sensitive and -resistant breast cancer types. In comparison to standard state-of-the-art clustering analyses, our approach offers an optimal partitioning of genes into topics and cells into clusters simultaneously, producing easily interpretable clustering outcomes. Additionally, we demonstrate that an integrative clustering approach, which combines the information from mRNAs and lncRNAs treated as disjoint omics layers, enhances the accuracy of cell classification.

Publisher

MDPI AG

Link

https://www.mdpi.com/2072-6694/16/7/1350/pdf

Reference59 articles.

1. Yu, L., Cao, Y., Yang, J.Y.H., and Yang, P. (2022). Benchmarking clustering algorithms on estimating the number of cell types from single-cell RNA-sequencing data. Genome Biol., 23.

2. Challenges in unsupervised clustering of single-cell RNA-seq data;Kiselev;Nat. Rev. Genet.,2019

3. Valle, F., Osella, M., and Caselle, M. (2020). A Topic Modeling Analysis of TCGA Breast and Lung Cancer Transcriptomic Data. Cancers, 12.

4. Valle, F., Osella, M., and Caselle, M. (2022). Multiomics Topic Modeling for Breast Cancer Classification. Cancers, 14.

5. Morelli, L., Giansanti, V., and Cittaro, D. (2021). Nested Stochastic Block Models applied to the analysis of single cell data. BMC Bioinform., 22.