Improving Scalable K-Means++-Reference-Cited by-同舟云学术

Improving Scalable K-Means++

Published:2020-12-27 Issue:1 Volume:14 Page:6
ISSN:1999-4893
Container-title:Algorithms
language:en
Short-container-title:Algorithms

Author:

Hämäläinen Joonas^ORCID,Kärkkäinen Tommi,Rossi Tuomo

Abstract

Two new initialization methods for K-means clustering are proposed. Both proposals are based on applying a divide-and-conquer approach for the K-means‖ type of an initialization strategy. The second proposal also uses multiple lower-dimensional subspaces produced by the random projection method for the initialization. The proposed methods are scalable and can be run in parallel, which make them suitable for initializing large-scale problems. In the experiments, comparison of the proposed methods to the K-means++ and K-means‖ methods is conducted using an extensive set of reference and synthetic large-scale datasets. Concerning the latter, a novel high-dimensional clustering data generation algorithm is given. The experiments show that the proposed methods compare favorably to the state-of-the-art by improving clustering accuracy and the speed of convergence. We also observe that the currently most popular K-means++ initialization behaves like the random one in the very high-dimensional cases.

Funder

Academy of Finland

Publisher

MDPI AG

Subject

Computational Mathematics,Computational Theory and Mathematics,Numerical Analysis,Theoretical Computer Science

Link

https://www.mdpi.com/1999-4893/14/1/6/pdf

Reference54 articles.

1. Least squares quantization in PCM

2. A comparative study of efficient initialization methods for the k-means clustering algorithm

3. How much can k-means be improved by using better initialization and repeats?

4. Comparison of Internal Clustering Validation Indices for Prototype-Based Clustering

Cited by 22 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Varroa destructor detection on honey bees using hyperspectral imagery;Computers and Electronics in Agriculture;2024-09

2. An inversion-based clustering approach for complex clusters;BMC Research Notes;2024-05-12

3. SITE selection and optimization for urban epidemic prevention material distribution points;Seventh International Conference on Traffic Engineering and Transportation System (ICTETS 2023);2024-02-20

4. 零填充双模光正交频分复用索引调制的三级分步检测算法;Acta Optica Sinica;2024

5. Analysis of the spatio-temporal pattern of restaurant performance using POI data and STFTiS model;Journal of Geomatics Science and Technology;2023-12-01