Abstract
AbstractWe propose a novel strategy for incorporating hierarchical supervised label information into nonlinear dimensionality reduction techniques. Specifically, we extend t-SNE, UMAP, and PHATE to include known or predicted class labels and demonstrate the efficacy of our approach on multiple single-cell RNA sequencing datasets. Our approach, “Haisu,” is applicable across domains and methods of nonlinear dimensionality reduction. In general, the mathematical effect of Haisu can be summarized as a variable perturbation of the high dimensional space in which the original data is observed. We thereby preserve the core characteristics of the visualization method and only change the manifold to respect known or assumed class labels when provided. Our strategy is designed to aid in the discovery and understanding of underlying patterns in a dataset that is heavily influenced by parent-child relationships. We show that using our approach can also help in semi-supervised settings where labels are known for only some datapoints (for instance when only a fraction of the cells is labeled). In summary, Haisu extends existing popular visualization methods to enable a user to incorporate known, relevant relationships via a user-defined hierarchical distancing factor.Availabilitygithub.com/Cobanoglu-Lab/Haisu
Publisher
Cold Spring Harbor Laboratory
Reference26 articles.
1. Critical analysis of Big Data challenges and analytical methods
2. “Dimensionality Reduction - an overview — ScienceDirect Topics.” https://www.sciencedirect.com/topics/computer-science/dimensionality-reduction (accessed May 02, 2020).
3. A Nonlinear Mapping for Data Structure Analysis
4. Curvilinear component analysis: a self-organizing neural network for nonlinear mapping of data sets
5. G. E. Hinton and S. T. Roweis , “Stochastic Neighbor Embedding,” p. 8.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献