Affiliation:
1. Centre for Advances in Reliability and Safety (CAiRS), Hong Kong SAR 999077, China
2. Department of Electrical and Electronic Engineering, The Hong Kong Polytechnic University, Hong Kong SAR 999077, China
Abstract
Manifold learning-based approaches have emerged as prominent techniques for dimensionality reduction. Among these methods, t-Distributed Stochastic Neighbor Embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP) stand out as two of the most widely used and effective approaches. While both methods share similar underlying procedures, empirical observations indicate two distinctive properties: global data structure preservation and computational efficiency. However, the underlying mathematical principles behind these distinctions remain elusive. To address this gap, this study presents a comparative analysis of the subprocesses involved in these methods, aiming to elucidate the mathematical mechanisms underlying the observed distinctions. By meticulously examining the equation formulations, the mathematical mechanisms contributing to global data structure preservation and computational efficiency are elucidated. To validate the theoretical analysis, data are collected through a laboratory experiment, and an open-source dataset is utilized for validation across different datasets. The consistent alignment of results obtained from both balanced and unbalanced datasets robustly confirms the study’s findings. The insights gained from this study provide a deeper understanding of the mathematical underpinnings of t-SNE and UMAP, enabling more informed and effective use of these dimensionality reduction techniques in various applications, such as anomaly detection, natural language processing, and bioinformatics.
Funder
Centre for Advances in Reliability and Safety
Reference39 articles.
1. Fodor, I.K. (2002). A Survey of Dimension Reduction Techniques, Lawrence Livermore National Laboratory.
2. Garzon, M., Yang, C.-C., Venugopal, D., Kumar, N., Jana, K., and Deng, L.-Y. (2022). Dimensionality Reduction in Data Science, Springer.
3. Toward a quantitative survey of dimension reduction techniques;Espadoto;IEEE Trans. Vis. Comput. Graph.,2021
4. Principal Component Analysis;Abdi;Wiley Interdiscip. Rev. Comput. Stat.,2010
5. Principal component analysis;Wold;Chemom. Intell. Lab. Syst.,1987