Affiliation:
1. NTT Communication Science Laboratories, Atsugi-shi, Kanagawa, Japan
2. NTT Computer and Data Science Laboratories, Musashino-shi, Tokyo, Japan
3. NTT Human Informatics Laboratories, Yokosuka-shi, Kanagawa, Japan
Abstract
K-Multiple-Means is an extension of K-means for the clustering of multiple means used in many applications, such as image segmentation, load balancing, and blind-source separation. Since K-means uses only one mean to represent each cluster, it fails to capture non-spherical cluster structures of data points. However, since K-Multiple-Means represents the cluster by computing multiple means and grouping them into specified c clusters, it can effectively capture the non-spherical clusters of the data points. To obtain the clusters, K-Multiple-Means updates a similarity matrix of a bipartite graph between the data points and the multiple means by iteratively computing the leading c singular vectors of the matrix. K-Multiple-Means, however, incurs a high computation cost for large-scale data due to the iterative SVD computations. Our proposal, F-KMM, increases the efficiency of K-Multiple-Means by computing the singular vectors from a smaller similarity matrix between the multiple means obtained from the similarity matrix of the bipartite graph. To compute the similarity matrix of the bipartite graph efficiently, we skip unnecessary distance computations and estimate lower bounding distances between the data points and the multiple means. Theoretically, the proposed approach guarantees the same clustering results as K-Multiple-Means since it can exactly compute the singular vectors from the similarity matrix between the multiple means. Experiments show that our approach is several orders of magnitude faster than previous clustering approaches that use multiple means.
Publisher
Association for Computing Machinery (ACM)
Reference52 articles.
1. David Arthur and Sergei Vassilvitskii. 2007. k-means: The Advantages of Careful Seeding. In SODA. 1027--1035.
2. Shenglan Ben Zhong Jin and Jingyu Yang. 2011. Guided Fuzzy Clustering with Multi-prototypes. In IJCNN. 2430--2436.
3. ECKM: An improved K-means clustering based on computational geometry
4. LOF
5. Fan R. K. Chung. 1996. Spectral Graph Theory. American Mathematical Society.