Author:
Yang Xiaodong,Liu Guole,Feng Guihai,Bu Dechao,Wang Pengfei,Jiang Jie,Chen Shubai,Yang Qinmeng,Zhang Yiyang,Man Zhenpeng,Liang Zhongming,Wang Zichen,Li Yaning,Li Zheng,Liu Yana,Tian Yao,Li Ao,Dong Jingxi,Hu Zhilong,Fang Chen,Miao Hefan,Cui Lina,Deng Zixu,Jiang Haiping,Cui Wentao,Zhang Jiahao,Yang Zhaohui,Li Handong,He Xingjian,Zhong Liqun,Zhou Jiaheng,Wang Zijian,Long Qingqing,Xu Ping,Wang Hongmei,Meng Zhen,Wang Xuezhi,Wang Yangang,Wang Yong,Zhang Shihua,Guo Jingtao,Zhao Yi,Zhou Yuanchun,Li Fei,Liu Jing,Chen Yiqiang,Yang Ge,Li Xin,
Abstract
AbstractDeciphering the universal gene regulatory mechanisms in diverse organisms holds great potential to advance our knowledge of fundamental life process and facilitate research on clinical applications. However, the traditional research paradigm primarily focuses on individual model organisms, resulting in limited collection and integration of complex features on various cell types across species. Recent breakthroughs in single-cell sequencing and advancements in deep learning techniques present an unprecedented opportunity to tackle this challenge. In this study, we developed GeneCompass, the first knowledge-informed, cross-species foundation model pre-trained on an extensive dataset of over 120 million single-cell transcriptomes from human and mouse. During pre-training, GeneCompass effectively integrates four types of biological prior knowledge to enhance the understanding of gene regulatory mechanisms in a self-supervised manner. Fine-tuning towards multiple downstream tasks, GeneCompass outperforms competing state-of-the-art models in multiple tasks on single species and unlocks new realms of cross-species biological investigation. Overall, GeneCompass marks a milestone in advancing knowledge of universal gene regulatory mechanisms and accelerating the discovery of key cell fate regulators and candidate targets for drug development.
Publisher
Cold Spring Harbor Laboratory