Towards a comprehensive visualisation of structure in large scale data sets-Reference-Cited by-同舟云学术

Towards a comprehensive visualisation of structure in large scale data sets

Published:2024-09-01 Issue:3 Volume:5 Page:030503
ISSN:2632-2153
Container-title:Machine Learning: Science and Technology
language:
Short-container-title:Mach. Learn.: Sci. Technol.

Author:

Garriga Joan^ORCID,Bartumeus Frederic^ORCID

Abstract

Abstract Dimensionality reduction methods are fundamental to the exploration and visualisation of large data sets. Basic requirements for unsupervised data exploration are flexibility and scalability. However, current methods have computational limitations that restrict our ability to explore data structures to the lower range of scales. We focus on t-SNE and propose a chunk-and-mix protocol that enables the parallel implementation of this algorithm, as well as a self-adaptive parametric scheme that facilitates its parametric configuration. As a proof of concept, we present the pt-SNE algorithm, a parallel version of Barnes-Hat-SNE (an

O ( n log n )

implementation of t-SNE). In pt-SNE, a single free parameter for the size of the neighbourhood, namely the perplexity, modulates the visualisation of the data structure at different scales, from local to global. Thanks to parallelisation, the runtime of the algorithm remains almost independent of the perplexity, which extends the range of scales to be analysed. The pt-SNE converges to a good global embedding comparable to current solutions, although it adds little noise at the local scale. This noise illustrates an unavoidable trade-off between computational speed and accuracy. We expect the same approach to be applicable to faster embedding algorithms than Barnes-Hat-SNE, such as Fast-Fourier Interpolation-based t-SNE or Uniform Manifold Approximation and Projection, thus extending the state of the art and allowing a more comprehensive visualisation and analysis of data structures.

Funder

Spanish Ministry

Publisher

IOP Publishing

Link

https://iopscience.iop.org/article/10.1088/2632-2153/ad6fea/pdf

Reference35 articles.

1. Data visualization by nonlinear dimensionality reduction;Gisbrecht;Wiley Int. Rev. Data Min. Knowl. Disc.,2015

2. Analysis of a complex of statistical variables into principal components;Hotelling;J. Educ. Psych.,1933

3. Multidimensional scaling: I. Theory and method;Torgerson;Psychometrika,1952

4. A nonlinear mapping for data structure analysis;Sammon;IEEE Trans. Comput.,1969

5. A global geometric framework for nonlinear dimensionality reduction;Tenenbaum;Science,2000