A novel metric reveals previously unrecognized distortion in dimensionality reduction of scRNA-seq data-Reference-Cited by-同舟云学术

A novel metric reveals previously unrecognized distortion in dimensionality reduction of scRNA-seq data

Published:2019-07-02 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Cooley Shamus M.^ORCID,Hamilton Timothy^ORCID,Aragones Samuel D.,Ray J. Christian J.^ORCID,Deeds Eric J.^ORCID

Abstract

AbstractHigh-dimensional data are becoming increasingly common in nearly all areas of science. Developing approaches to analyze these data and understand their meaning is a pressing issue. This is particularly true for single-cell RNA-seq (scRNA-seq), a technique that simultaneously measures the expression of tens of thousands of genes in thousands to millions of single cells. The emerging consensus for analysis workflows significantly reduces the dimensionality of the dataset before performing downstream analysis, such as assignment of cell types. One problem with this approach is that dimensionality reduction can introduce substantial distortion into the data; consider the familiar example of trying to represent the three-dimensional earth as a two-dimensional map. It is currently unclear if such distortion affects analysis of scRNA-seq data. Here, we introduce a straightforward approach to quantifying this distortion by comparing the local neighborhoods of points before and after dimensionality reduction. We found that popular techniques like t-SNE and UMAP introduce substantial distortion even for relatively simple simulated data sets. For scRNA-seq data, we found the distortion in local neighborhoods was often greater than 95% in the representations typically used for downstream analyses. This level of distortion can introduce errors into cell type identification, pseudotime ordering, and other analyses. We found that principal component analysis can generate accurate embeddings, but only when using dimensionalities that are much higher than typically used in scRNA-seq analysis. Our work suggests the need for a new generation of dimensional reduction algorithms that can accurately embed high dimensional data in its true latent dimension.

Publisher

Cold Spring Harbor Laboratory

Reference50 articles.

1. Tutorial: guidelines for the computational analysis of single-cell RNA sequencing data

2. Differentiation dynamics of mammary epithelial cells revealed by single-cell RNA sequencing

3. Variable bandwidth diffusion kernels;Applied and Computational Harmonic Analysis,2016

4. Integrating single-cell transcriptomic data across different conditions, technologies, and species

5. The single-cell transcriptional landscape of mammalian organogenesis

Cited by 39 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Data Visualization in Large Scale Based on Trained Data;Advances in Data Mining and Database Management;2024-04-19

2. Principled and interpretable alignability testing and integration of single-cell data;Proceedings of the National Academy of Sciences;2024-02-28

3. Statistical method scDEED for detecting dubious 2D single-cell embeddings and optimizing t-SNE and UMAP hyperparameters;Nature Communications;2024-02-26

4. cellstruct: Metrics scores to quantify the biological preservation between two embeddings;2023-11-14

5. GoM DE: interpreting structure in sequence count data with differential expression analysis allowing for grades of membership;Genome Biology;2023-10-19