Gaining Biological Insights through Supervised Data Visualization-Reference-Cited by-同舟云学术

Gaining Biological Insights through Supervised Data Visualization

Published:2023-11-23 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Rhodes Jake S.^ORCID,Aumon Adrien,Morin Sacha,Girard Marc,Larochelle Catherine,Brunet-Ratnasingham Elsa,Pagliuzza Amélie,Marchitto Lorie,Zhang Wei,Cutler Adele,Grand’Maison Francois,Zhou Anhong,Finzi Andrés,Chomont Nicolas^ORCID,Kaufmann Daniel E.,Zandee Stephanie^ORCID,Prat Alexandre,Wolf Guy^ORCID,Moon Kevin R.

Abstract

AbstractDimensionality reduction-based data visualization is pivotal in comprehending complex biological data. The most common methods, such as PHATE, t-SNE, and UMAP, are unsupervised and therefore reflect the dominant structure in the data, which may be independent of expert-provided labels. Here we introduce a supervised data visualization method called RF-PHATE, which integrates expert knowledge for further exploration of the data. RF-PHATE leverages random forests to capture intricate featurelabel relationships. Extracting information from the forest, RF-PHATE generates low-dimensional visualizations that highlight relevant data relationships while disregarding extraneous features. This approach scales to large datasets and applies to classification and regression. We illustrate RF-PHATE’s prowess through three case studies. In a multiple sclerosis study using longitudinal clinical and imaging data, RF-PHATE unveils a sub-group of patients with non-benign relapsingremitting Multiple Sclerosis, demonstrating its aptitude for time-series data. In the context of Raman spectral data, RF-PHATE effectively showcases the impact of antioxidants on diesel exhaust-exposed lung cells, highlighting its proficiency in noisy environments. Furthermore, RF-PHATE aligns established geometric structures with COVID-19 patient outcomes, enriching interpretability in a hierarchical manner. RF-PHATE bridges expert insights and visualizations, promising knowledge generation. Its adaptability, scalability, and noise tolerance underscore its potential for widespread adoption.

Publisher

Cold Spring Harbor Laboratory

Reference106 articles.

1. Visualizing structure and transitions in high-dimensional biological data

2. Multiobjective evolutionary algorithms to identify highly autocorrelated areas: the case of spatial distribution in financially compromised farms

3. UMAP: Uniform Manifold Approximation and Projection

4. Nonlinear Dimensionality Reduction by Locally Linear Embedding

5. A Global Geometric Framework for Nonlinear Dimensionality Reduction