Balanced k-means revisited-Reference-Cited by-同舟云学术

Balanced k-means revisited

Published:2023 Issue:2 Volume:3 Page:145-179
ISSN:2771-392X
Container-title:Applied Computing and Intelligence
language:
Short-container-title:ACI

Author:

de Maeyer Rieke¹,Sieranoja Sami²,Fränti Pasi²

Affiliation:

1. Saarland Informatics Campus, Saarland University, Saarbrücken, Germany

2. Machine Learning Group, School of Computing, University of Eastern Finland, Joensuu, Finland

Abstract

<abstract><p>The $ k $-means algorithm aims at minimizing the variance within clusters without considering the balance of cluster sizes. Balanced $ k $-means defines the partition as a pairing problem that enforces the cluster sizes to be strictly balanced, but the resulting algorithm is impractically slow $ \mathcal{O}(n^3) $. Regularized $ k $-means addresses the problem using a regularization term including a balance parameter. It works reasonably well when the balance of the cluster sizes is a mandatory requirement but does not generalize well for soft balance requirements. In this paper, we revisit the $ k $-means algorithm as a two-objective optimization problem with two goals contradicting each other: to minimize the variance within clusters and to minimize the difference in cluster sizes. The proposed algorithm implements a balance-driven variant of $ k $-means which initially only focuses on minimizing the variance but adds more weight to the balance constraint in each iteration. The resulting balance degree is not determined by a control parameter that has to be tuned, but by the point of termination which can be precisely specified by a balance criterion.</p></abstract>

Publisher

American Institute of Mathematical Sciences (AIMS)

Reference47 articles.

1. F. Kovács, C. Legány, A. Babos, Cluster validity measurement techniques, Proceedings of the 5th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering and Data Bases, (2006).

2. A. K. Jain, Data clustering: 50 years beyond k-means, Pattern Recogn. Lett., 31 (2010), 651-666. https://doi.org/10.1016/j.patrec.2009.09.011

3. X. Wu, V. Kumar, R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, et al., Top 10 algorithms in data mining, Knowl. Inf. Syst., 14 (2007), 1-37. https://doi.org/10.1007/s10115-007-0114-2

4. S. P. Lloyd, Least squares quantization in pcm, IEEE T. Inform. Theory, 28 (1982), 129-137. https://doi.org/10.1109/TIT.1982.1056489

5. E. W. Forgy, Cluster analysis of multivariate data: efficiency versus interpretability of classifications, Biometrics, 21 (1965), 768-769.

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. TA3D: Timing-Aware 3D IC Partitioning and Placement by Optimizing the Critical Path;Proceedings of the 2024 ACM/IEEE International Symposium on Machine Learning for CAD;2024-09-09

2. Balancing the cardinality of clusters with a distance constraint: a fast algorithm;Annals of Operations Research;2024-05-09

3. Unsupervised Deep Clustering With Hard Balanced Constraint: Application in Disciplinary-Focused Student Section Formation;IEEE Access;2024