Cell-Graph Compass: Modeling Single Cells with Graph Structure Foundation Model-Reference-Cited by-同舟云学术

Cell-Graph Compass: Modeling Single Cells with Graph Structure Foundation Model

Published:2024-06-06 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Fang Chen^ORCID,Hu Zhilong,Chang Shaole,Long Qingqing^ORCID,Cui Wentao^ORCID,Liu Wenhao,Li Cong,Liu Yana,Wang Pengfei,Meng Zhen,Pan Jia,Zhou Yuanchun,Feng Guihai^ORCID,Chen Linghui,Li Xin

Abstract

AbstractInspired by the advancements in pre-trained Large Language Models, there has been a surge of studies in the Life Sciences focusing on constructing foundation models with large scale single-cell RNA-seq data. These studies typically involve pre-training a transformer model on large-scale single-cell sequencing data, followed by fine-tuning for a variety of downstream tasks, achieving notable performance. However, these models all share a common short-coming: to utilize the transformer architecture, originally designed for textual data, they artificially impose a sequential structure on genes within cells, simplifying the complex interactions between genes. Furthermore, they focus solely on transcriptomic data, neglecting other relevant biological information. To address these issues, here we introduce Cell-Graph Compass (CGC), the first foundational model that leverages graph structures to model single cells and describes cells from multiple perspectives, including transcriptional profiles, gene text summaries, transcription factor regulatory networks, gene co-expression patterns, and gene positional relationships. By incorporating self-attention mechanisms, we pretrained the model on 50 million human single-cell sequencing data, resulting in a robust digital representation of cells. Extensive downstream experiments demonstrate that our approach can capture meaningful biological knowledge and achieve superior results in various problem scenarios, achieving the state-of-the-art (SOTA).

Publisher

Cold Spring Harbor Laboratory

Reference81 articles.

1. A. Vaswani , N. Shazeer , N. Parmar , et al., ”Attention is all you need,” NIPS’17, 6000–6010, Curran Associates Inc., (Red Hook, NY, USA) (2017).

2. J. Devlin , M.-W. Chang , K. Lee , et al., ”BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), J. Burstein , C. Doran , and T. Solorio , Eds., 4171–4186, Association for Computational Linguistics, (Minneapolis, Minnesota) (2019).

3. A. Radford and K. Narasimhan , ”Improving language understanding by generative pretraining,” (2018).

4. OpenAI, ”Gpt-4 technical report,” (2023).

5. Y. Wang , Z. Yu , Z. Zeng , et al., ”Pandalm: An automatic evaluation benchmark for llm instruction tuning optimization,” (2024).