Subpopulation identification for single-cell RNA-sequencing data using functional data analysis-Reference-Cited by-同舟云学术

Subpopulation identification for single-cell RNA-sequencing data using functional data analysis

Published:2019-09-12 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Ahn Kyungmin,Fujiwara Hironobu

Abstract

AbstractBackgroundIn single-cell RNA-sequencing (scRNA-seq) data analysis, a number of statistical tools in multivariate data analysis (MDA) have been developed to help analyze the gene expression data. This MDA approach is typically focused on examining discrete genomic units of genes that ignores the dependency between the data components. In this paper, we propose a functional data analysis (FDA) approach on scRNA-seq data whereby we consider each cell as a single function. To avoid a large number of dropouts (zero or zero-closed values) and reduce the high dimensionality of the data, we first perform a principal component analysis (PCA) and assign PCs to be the amplitude of the function. Then we use the index of PCs directly from PCA for the phase components. This approach allows us to apply FDA clustering methods to scRNA-seq data analysis.ResultsTo demonstrate the robustness of our method, we apply several existing FDA clustering algorithms to the gene expression data to improve the accuracy of the classification of the cell types against the conventional clustering methods in MDA. As a result, the FDA clustering algorithms achieve superior accuracy on simulated data as well as real data such as human and mouse scRNA-seq data.ConclusionsThis new statistical technique enhances the classification performance and ultimately improves the understanding of stochastic biological processes. This new framework provides an essentially different scRNA-seq data analytical approach, which can complement conventional MDA methods. It can be truly effective when current MDA methods cannot detect or uncover the hidden functional nature of the gene expression dynamics.

Publisher

Cold Spring Harbor Laboratory

Reference62 articles.

1. M. R. Anderberg . Cluster analysis for applications: probability and mathematical statistics: a series of monographs and textbooks, volume 19. Academic press, 2014.

2. Identifying cell populations with scrnaseq;Molecular aspects of medicine,2018

3. Continuous Representations of Time-Series Gene Expression Data

4. R. Becker . The new S language. CRC Press, 2018.

5. Defining the three cell lineages of the human blastocyst by single-cell RNA-seq