Abstract
AbstractGrouping individual cells in clusters and annotating these based on feature expression is a common procedure in single-cell analysis pipelines. Multiple methods have been reported for single-cell mRNA sequencing and cytometry datasets where the vast majority rely on sequential 2-step procedures involving I) cell clustering based on notions of similarity and II) cluster annotation via manual or semi-automated methods. However, as arbitrary borders are drawn between more or less similar groups of cells, one cannot guarantee that all cells within a cluster are of the same type. Further, dimensionality reduction has been shown to cause considerable distortion in high-dimensional datasets and is prone to variable annotations of the same cell when relative changes occur in data composition. Another limitation of existing methods is that simultaneous analyses of large sets of cells are computationally expensive and difficult to scale for growing datasets or metanalyses across multiple datasets. Here we present an alternative method based on calculation of Earth Mover’s Distance and a Bayesian classifier coupled to Random Forest, which annotates one cell at a time removing the need for prior clustering and resulting in improved accuracy, better scaling with increasing cell numbers and less computational resources needed.
Publisher
Cold Spring Harbor Laboratory