Efficient Algorithm for K-Multiple-Means-Reference-Cited by-同舟云学术

Efficient Algorithm for K-Multiple-Means

Published:2024-03-12 Issue:1 Volume:2 Page:1-26
ISSN:2836-6573
Container-title:Proceedings of the ACM on Management of Data
language:en
Short-container-title:Proc. ACM Manag. Data

Author:

Fujiwara Yasuhiro¹^ORCID,Kumagai Atsutoshi²^ORCID,Ida Yasutoshi²^ORCID,Nakano Masahiro¹^ORCID,Nakatsuji Makoto³^ORCID,Kimura Akisato¹^ORCID

Affiliation:

1. NTT Communication Science Laboratories, Atsugi-shi, Kanagawa, Japan

2. NTT Computer and Data Science Laboratories, Musashino-shi, Tokyo, Japan

3. NTT Human Informatics Laboratories, Yokosuka-shi, Kanagawa, Japan

Abstract

K-Multiple-Means is an extension of K-means for the clustering of multiple means used in many applications, such as image segmentation, load balancing, and blind-source separation. Since K-means uses only one mean to represent each cluster, it fails to capture non-spherical cluster structures of data points. However, since K-Multiple-Means represents the cluster by computing multiple means and grouping them into specified c clusters, it can effectively capture the non-spherical clusters of the data points. To obtain the clusters, K-Multiple-Means updates a similarity matrix of a bipartite graph between the data points and the multiple means by iteratively computing the leading c singular vectors of the matrix. K-Multiple-Means, however, incurs a high computation cost for large-scale data due to the iterative SVD computations. Our proposal, F-KMM, increases the efficiency of K-Multiple-Means by computing the singular vectors from a smaller similarity matrix between the multiple means obtained from the similarity matrix of the bipartite graph. To compute the similarity matrix of the bipartite graph efficiently, we skip unnecessary distance computations and estimate lower bounding distances between the data points and the multiple means. Theoretically, the proposed approach guarantees the same clustering results as K-Multiple-Means since it can exactly compute the singular vectors from the similarity matrix between the multiple means. Experiments show that our approach is several orders of magnitude faster than previous clustering approaches that use multiple means.

Funder

JSPS KAKENHI

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3639273

Reference52 articles.

1. David Arthur and Sergei Vassilvitskii. 2007. k-means: The Advantages of Careful Seeding. In SODA. 1027--1035.

2. Shenglan Ben Zhong Jin and Jingyu Yang. 2011. Guided Fuzzy Clustering with Multi-prototypes. In IJCNN. 2430--2436.

3. ECKM: An improved K-means clustering based on computational geometry

4. LOF

5. Fan R. K. Chung. 1996. Spectral Graph Theory. American Mathematical Society.