Abstract
Unsupervised learning makes manifest the underlying structure of data without curated training and specific problem definitions. However, the inference of relationships between data points is frustrated by the “curse of dimensionality” in high dimensions. Inspired by replica theory from statistical mechanics, we consider replicas of the system to tune the dimensionality and take the limit as the number of replicas goes to zero. The result is intensive embedding, which not only is isometric (preserving local distances) but also allows global structure to be more transparently visualized. We develop the Intensive Principal Component Analysis (InPCA) and demonstrate clear improvements in visualizations of the Ising model of magnetic spins, a neural network, and the dark energy cold dark matter (ΛCDM) model as applied to the cosmic microwave background.
Publisher
Proceedings of the National Academy of Sciences
Reference30 articles.
1. From visual data exploration to visual data mining: A survey;De Oliveira;IEEE Trans. Visualization Comput. Graphics,2003
2. Visualizing high-dimensional data: Advances in the past decade;Liu;IEEE Trans. Visualization Comput. Graphics,2017
3. J. A. Lee , M. Verleysen , Nonlinear Dimensionality Reduction (Springer, New York, NY, 2007).
4. A survey on unsupervised outlier detection in high-dimensional numerical data;Zimek;Stat. Anal. Data Mining ASA Data Sci. J.,2012
5. K. P. Murphy , Machine Learning: A Probabilistic Perspective (The MIT Press, 2012).
Cited by
10 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献