Author:
Huang Xiaodi,Ren Minglun,Zhu Xiaoxi
Abstract
Abstract
As a classical clustering algorithm, K-means has been widely applied due to its features of simple mathematical thinking, fast convergence rate, less complexity, and easy to implementation. However, K-means algorithm always requires users to set the desired number of clusters in advance, and the initial cluster centers are usually generated in a random way. When dealing with unknown datasets that users do not have enough domain-assisted knowledge, such parameters setting strategies not only increases the burden on users, but also makes clustering quality difficult to guarantee. Therefore, in view of the high sensitivity of K-means clustering process to initial parameters, this paper propose an improved DDWK-means (Distance-Density-Weight K-means) algorithm. Based on the distance-density feature and the method of inertia weight of particle swarm optimization algorithm, the optimal initial cluster centers not only can be determined adaptively according to the structural characteristics of the dataset itself without introducing artificial parameters, but also can be adjusted dynamically due to the threshold change of clustering quality metric. We make an experimental study with five standard datasets from UCI (University of California Irvine), and the results indicate that the DDWK-means algorithm exhibits a significantly improvement in clustering efficiency and stability.
Subject
General Physics and Astronomy
Reference41 articles.
1. Survey on clustering in heterogeneous and homogeneous wireless sensor networks;Rostami;Journal of Supercomputing,2017
2. A Fast and Accurate Algorithm for Unsupervised Clustering Around Centroids;Mazzeo;Information Sciences,2017
3. Geo-uninorm Consistency Control Module for Preference Similarity Network Hierarchical Clustering Based Consensus Model;Kamis,2018
4. UD-HMM: An unsupervised method for shilling attack detection based on hidden Markov model and hierarchical clustering;Zhang,2018
5. A Survey of Clustering Data Mining Techniques;Berkhin;Grouping Multidimensional Data,2006