DETERMINISTIC INITIALIZATION OF THE K-MEANS ALGORITHM USING HIERARCHICAL CLUSTERING-Reference-Cited by-同舟云学术

DETERMINISTIC INITIALIZATION OF THE K-MEANS ALGORITHM USING HIERARCHICAL CLUSTERING

Published:2012-11 Issue:07 Volume:26 Page:1250018
ISSN:0218-0014
Container-title:International Journal of Pattern Recognition and Artificial Intelligence
language:en
Short-container-title:Int. J. Patt. Recogn. Artif. Intell.

Author:

CELEBI M. EMRE¹,KINGRAVI HASSAN A.²

Affiliation:

1. Department of Computer Science, Louisiana State University, Shreveport, LA, USA

2. School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA

Abstract

K-means is undoubtedly the most widely used partitional clustering algorithm. Unfortunately, due to its gradient descent nature, this algorithm is highly sensitive to the initial placement of the cluster centers. Numerous initialization methods have been proposed to address this problem. Many of these methods, however, have superlinear complexity in the number of data points, making them impractical for large data sets. On the other hand, linear methods are often random and/or order-sensitive, which renders their results unrepeatable. Recently, Su and Dy proposed two highly successful hierarchical initialization methods named Var-Part and PCA-Part that are not only linear, but also deterministic (nonrandom) and order-invariant. In this paper, we propose a discriminant analysis based approach that addresses a common deficiency of these two methods. Experiments on a large and diverse collection of data sets from the UCI machine learning repository demonstrate that Var-Part and PCA-Part are highly competitive with one of the best random initialization methods to date, i.e. k-means++, and that the proposed approach significantly improves the performance of both hierarchical methods.

Publisher

World Scientific Pub Co Pte Ltd

Subject

Artificial Intelligence,Computer Vision and Pattern Recognition,Software

Link

https://www.worldscientific.com/doi/pdf/10.1142/S0218001412500188

Reference45 articles.

1. Data clustering

2. Data clustering: 50 years beyond K-means

3. Finding Groups in Data

4. NP-hardness of Euclidean sum-of-squares clustering

5. The planar k-means problem is NP-hard

Cited by 48 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. K-NNDP: K-means algorithm based on nearest neighbor density peak optimization and outlier removal;Knowledge-Based Systems;2024-06

2. Color-to-gray image conversion using salient colors and radial basis functions;Journal of Electronic Imaging;2024-02-22

3. CAP-RNAseq: an integrated pipeline for functional annotation and prioritization of co-expression clusters;Briefings in Bioinformatics;2024-01-22

4. Exploring the joint impacts of income, car ownership, and built environment on daily activity patterns: a cluster analysis of trip chains;Transportmetrica A: Transport Science;2023-07-19

5. Forty years of color quantization: a modern, algorithmic survey;Artificial Intelligence Review;2023-04-27