Affiliation:
1. Center for Computational Systems Medicine McWilliams School of Biomedical Informatics The University of Texas Health Science Center at Houston Houston TX 77030 USA
2. Department of Electronic Information and Computer Engineering The Engineering & Technical College of Chengdu University of Technology Leshan Sichuan 614000 China
3. McGovern Medical School The University of Texas Health Science Center at Houston Houston TX 77030 USA
4. School of Dentistry The University of Texas Health Science Center at Houston Houston TX 77030 USA
Abstract
AbstractIn recent years, the integration of single‐cell multi‐omics data has provided a more comprehensive understanding of cell functions and internal regulatory mechanisms from a non‐single omics perspective, but it still suffers many challenges, such as omics‐variance, sparsity, cell heterogeneity, and confounding factors. As it is known, the cell cycle is regarded as a confounder when analyzing other factors in single‐cell RNA‐seq data, but it is not clear how it will work on the integrated single‐cell multi‐omics data. Here, a cell cycle‐aware network (CCAN) is developed to remove cell cycle effects from the integrated single‐cell multi‐omics data while keeping the cell type‐specific variations. This is the first computational model to study the cell‐cycle effects in the integration of single‐cell multi‐omics data. Validations on several benchmark datasets show the outstanding performance of CCAN in a variety of downstream analyses and applications, including removing cell cycle effects and batch effects of scRNA‐seq datasets from different protocols, integrating paired and unpaired scRNA‐seq and scATAC‐seq data, accurately transferring cell type labels from scRNA‐seq to scATAC‐seq data, and characterizing the differentiation process from hematopoietic stem cells to different lineages in the integration of differentiation data.
Funder
National Institutes of Health
National Science Foundation