Abstract
AbstractAnalysis of single-cell multiomics datasets is a novel topic and is considerably challenging because such datasets contain a large number of features with numerous missing values. In this study, we implemented a recently proposed tensor-decomposition (TD)–based unsupervised feature extraction (FE) technique to address this difficult problem. The technique can successfully integrate single-cell multiomics data composed of gene expression, DNA methylation, and accessibility. Although the last two have large dimensions, as many as ten million, containing only a few percentages of non-zero values, TD-based unsupervised FE can integrate three omics datasets without filling missing values. Together with UMAP, which is used frequently when embedding single-cell measurements into two-dimensional space, TD-based unsupervised FE can produce two-dimensional embedding coincident with classification when integrating single-cell omics datasets. Genes selected based on TD-based unsupervised FE were also significantly related to reasonable biological roles.
Publisher
Cold Spring Harbor Laboratory
Reference21 articles.
1. Single-cell multiomics: technologies and data analysis methods
2. A population genetic interpretation of GWAS findings for human quantitative traits
3. Unsupervised Feature Extraction Applied to Bioinformatics
4. Yan, R. ; Gu, C. ; You, D. ; Huang, Z. ; Qian, J. ; Yang, Q. ; Cheng, X. ; Zhang, L. ; Wang, H. ; Wang, P. ; Guo, F. Decoding dynamic epigenetic landscapes in human oocytes using single-cell multi-omics sequencing. Cell Stem Cell 2021. doi:https://doi.org/10.1016/j.stem.2021.04.012.
5. R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2020.