Abstract
Dimensionality reduction is standard practice for filtering noise and identifying relevant features in large-scale data analyses. In biology, single-cell genomics studies typically begin with reduction to 2 or 3 dimensions to produce “all-in-one” visuals of the data that are amenable to the human eye, and these are subsequently used for qualitative and quantitative exploratory analysis. However, there is little theoretical support for this practice, and we show that extreme dimension reduction, from hundreds or thousands of dimensions to 2, inevitably induces significant distortion of high-dimensional datasets. We therefore examine the practical implications of low-dimensional embedding of single-cell data and find that extensive distortions and inconsistent practices make such embeddings counter-productive for exploratory, biological analyses. In lieu of this, we discuss alternative approaches for conducting targeted embedding and feature exploration to enable hypothesis-driven biological discovery.
Funder
National Institutes of Health
Publisher
Public Library of Science (PLoS)
Subject
Computational Theory and Mathematics,Cellular and Molecular Neuroscience,Genetics,Molecular Biology,Ecology,Modeling and Simulation,Ecology, Evolution, Behavior and Systematics
Reference89 articles.
1. The art of using t-SNE for single-cell transcriptomics;D Kobak;Nat Commun,2019
2. The triumphs and limitations of computational methods for scRNA-seq;PV Kharchenko;Nat Methods,2021
3. Dimensionality reduction by UMAP reinforces sample heterogeneity analysis in bulk transcriptomic data;Y Yang;Cell Rep,2021
4. Visualizing Data using t-SNE;L van der Maaten;J Mach Learn Res,2008
5. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction;L McInnes;arXiv,2018
Cited by
77 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献