CellFM: a large-scale foundation model pre-trained on transcriptomics of 100 million human cells-Reference-Cited by-同舟云学术

CellFM: a large-scale foundation model pre-trained on transcriptomics of 100 million human cells

Published:2024-06-06 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Zeng Yuansong,Xie Jiancong,Wei Zhuoyi,Su Yun,Shangguan Ningyuan,Yang Shuangyu,Zhang Chengyang,Li Wenbing,Zhang Jinbo,Fang Nan,Zhang Hongyu,Zhao Huiying,Lu Yutong,Fan Jue,Yu Weijiang,Yang Yuedong^ORCID

Abstract

AbstractThe rapid evolution of single-cell sequencing technologies has facilitated precise transcriptomics profiling at the single-cell level, shedding light on the intricate heterogeneity within cellular populations. Despite these advances, the inherent diversity of cells and data challenges such as noise, batch effects, and sparsity, underscores the pressing need for a unified model to learn and represent cellular states effectively. Single-cell Large Language Models (LLMs) have been crafted to bridge this gap yet exhibit limited performance on human cells. This short-fall may stem from the confounding effects of training data from diverse species, partly because of limited cells for the single species. Here, we have compiled a dataset of approximately 100 million human cells sequenced by multiple technolo-gies from human single-cell datasets with various file types deposited in public databases and websites. Leveraging these extensive data cohorts, we developed CellFM, a robust single-cell foundation model with an impressive 800 million parameters, marking an eight-fold increase over the current largest single-species model. To ensure the training of CellFM on the MindSpore AI framework from Huawei, we have integrated RetNet, a Transformer architecture variant with lin-ear complexity for a balance between efficiency and performance, serving as the backbone of our model. Our comprehensive experiments have shown that CellFM outperforms existing models across diverse applications, such as cell annotation, perturbation prediction, and gene function prediction.

Publisher

Cold Spring Harbor Laboratory

Reference46 articles.

1. Construction of a human cell landscape at single-cell level

2. The future of rapid and automated single-cell data analysis using reference mapping;Cell,2024

3. Automatic cell-type harmonization and integration across Human Cell Atlas datasets

4. A single-cell atlas of the multicellular ecosystem of primary and metastatic hepatocellular carcinoma;Nature communications,2022

5. RNA virus interference via CRISPR/Cas13a system in plants