Stochastic cluster embedding
-
Published:2022-12-04
Issue:1
Volume:33
Page:
-
ISSN:0960-3174
-
Container-title:Statistics and Computing
-
language:en
-
Short-container-title:Stat Comput
Author:
Yang ZhirongORCID, Chen Yuwei, Sedov Denis, Kaski Samuel, Corander Jukka
Abstract
AbstractNeighbor embedding (NE) aims to preserve pairwise similarities between data items and has been shown to yield an effective principle for data visualization. However, even the best existing NE methods such as stochastic neighbor embedding (SNE) may leave large-scale patterns hidden, for example clusters, despite strong signals being present in the data. To address this, we propose a new cluster visualization method based on the Neighbor Embedding principle. We first present a family of Neighbor Embedding methods that generalizes SNE by using non-normalized Kullback–Leibler divergence with a scale parameter. In this family, much better cluster visualizations often appear with a parameter value different from the one corresponding to SNE. We also develop an efficient software that employs asynchronous stochastic block coordinate descent to optimize the new family of objective functions. Our experimental results demonstrate that the method consistently and substantially improves the visualization of data clusters compared with the state-of-the-art NE approaches. The code of our method is publicly available at https://github.com/rozyangno/sce.
Funder
Norges Forskningsråd
Publisher
Springer Science and Business Media LLC
Subject
Computational Theory and Mathematics,Statistics, Probability and Uncertainty,Statistics and Probability,Theoretical Computer Science
Reference21 articles.
1. Amari, S.: Differential-Geometrical Methods in Statistics. Springer, Berlin (1985) 2. Belkina, A., Ciccolella, C., Anno, R., Halpert, R., Spidlen, J., Snyder-Cappione, J.: Automated optimized parameters for t-distributed stochastic neighbor embedding improve visualization and analysis of large datasets. Nat. Commun. 10(5415), 1–12 (2019) 3. Borgo, R., Lee, B., Bach, B., Fabrikant, S., Jianu, R., Kerren, A., Kobourov, S., McGee, F., Micallef, L., von Landesberger, T., Ballweg, K., Diehl, S., Simonetto, P., Zhou, M.: Crowdsourcing for information visualization: Promises and pitfalls. In: Archambault, D., Purchase, H., Hoßfeld, T. (Eds.) Evaluation in the Crowd. Crowdsourcing and Human-Centered Experiments, Cham, Springer International Publishing. pp. 96–138 (2017). ISBN 978-3-319-66435-4 4. Chan, D.M., Rao, R., Huang, F., Canny, J.F.: Gpu accelerated t-distributed stochastic neighbor embedding. J. Parallel Distrib. Comput. 131, 1–13 (2019) 5. Chen, Y., Hakala, T., Karjalainen, M., Feng, Z., Tang, J., Litkey, P., Kukko, A., Jaakkola, A., Hyyppä, J.: Uav-borne profiling radar for forest research. Remote Sens. 9(1), 58 (2017)
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|