Affiliation:
1. Department of Energy Conversion and Storage, Technical University of Denmark, 2800 Kongens Lyngby, Denmark
2. Abzu ApS, 2150 København, Denmark
Abstract
Single-cell RNA sequencing (scRNA-seq) technology has significantly advanced our understanding of the diversity of cells and how this diversity is implicated in diseases. Yet, translating these findings across various scRNA-seq datasets poses challenges due to technical variability and dataset-specific biases. To overcome this, we present a novel approach that employs both an LLM-based framework and explainable machine learning to facilitate generalization across single-cell datasets and identify gene signatures to capture disease-driven transcriptional changes. Our approach uses scBERT, which harnesses shared transcriptomic features among cell types to establish consistent cell-type annotations across multiple scRNA-seq datasets. Additionally, we employed a symbolic regression algorithm to pinpoint highly relevant, yet minimally redundant models and features for inferring a cell type’s disease state based on its transcriptomic profile. We ascertained the versatility of these cell-specific gene signatures across datasets, showcasing their resilience as molecular markers to pinpoint and characterize disease-associated cell types. The validation was carried out using four publicly available scRNA-seq datasets from both healthy individuals and those suffering from ulcerative colitis (UC). This demonstrates our approach’s efficacy in bridging disparities specific to different datasets, fostering comparative analyses. Notably, the simplicity and symbolic nature of the retrieved gene signatures facilitate their interpretability, allowing us to elucidate underlying molecular disease mechanisms using these models.
Subject
Molecular Biology,Biochemistry
Reference35 articles.
1. Alberts, B. (2017). Molecular Biology of the Cell, Garland Science, Taylor and Francis Group.
2. Exploring the cellular basis of human disease through a large-scale mapping of deleterious genes to cell types;Cornish;Genome Med.,2015
3. An analytical method for the identification of cell-type-specific disease gene modules;Guan;J. Transl. Med.,2021
4. Single-cell sequencing-based technologies will revolutionize whole-organism science;Shapiro;Nat. Rev. Genet.,2013
5. Zhou, Y., Peng, M., Yang, B., Tong, T., Zhang, B., and Tang, N. (2022). scDLC: A deep learning framework to classify large sample single-cell RNA-seq data. BMC Genom., 23.
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献