Affiliation:
1. Ludwig-Maximilians-Universität München, Oettingenstr., München, Germany
2. University of California, California, CA
3. University of Vienna
Abstract
A huge object collection in high-dimensional space can often be clustered in more than one way, for instance, objects could be clustered by their shape or alternatively by their color. Each grouping represents a different view of the dataset. The new research field of
non-redundant clustering
addresses this class of problems. In this article, we follow the approach that different, non-redundant
k
-means-like clusterings may exist in different, arbitrarily oriented subspaces of the high-dimensional space. We assume that these subspaces (and optionally a further
noise space
without any cluster structure) are orthogonal to each other. This assumption enables a particularly rigorous mathematical treatment of the non-redundant clustering problem and thus a particularly efficient algorithm, which we call N
r
-K
means
(for non-redundant
k
-means). The superiority of our algorithm is demonstrated both theoretically, as well as in extensive experiments. Further, we propose an extension of N
r
-K
means
that harnesses Hartigan’s dip test to identify the number of clusters for each subspace automatically.
Publisher
Association for Computing Machinery (ACM)
Cited by
9 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Multiple clusterings: Recent advances and perspectives;Computer Science Review;2024-05
2. Non-Redundant Image Clustering of Early Medieval Glass Beads;2023 IEEE 10th International Conference on Data Science and Advanced Analytics (DSAA);2023-10-09
3. Semi-Supervised Embedding of Attributed Multiplex Networks;Proceedings of the ACM Web Conference 2023;2023-04-30
4. Method of Selecting the Optimal Location of Barrier-Free Bus Stops Using Clustering;Emotional Artificial Intelligence and Metaverse;2022-11-03
5. The DipEncoder: Enforcing Multimodality in Autoencoders;Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining;2022-08-14