Finding the Number of Clusters in Data and Better Initial Centers for K-means Algorithm-Reference-Cited by-同舟云学术

Finding the Number of Clusters in Data and Better Initial Centers for K-means Algorithm

Published:2020-12-08 Issue:6 Volume:12 Page:1-20
ISSN:2074-904X
Container-title:International Journal of Intelligent Systems and Applications
language:
Short-container-title:IJISA

Author:

Fahim Ahmed,

Abstract

The k-means is the most well-known algorithm for data clustering in data mining. Its simplicity and speed of convergence to local minima are the most important advantages of it, in addition to its linear time complexity. The most important open problems in this algorithm are the selection of initial centers and the determination of the exact number of clusters in advance. This paper proposes a solution for these two problems together; by adding a preprocess step to get the expected number of clusters in data and better initial centers. There are many researches to solve each of these problems separately, but there is no research to solve both problems together. The preprocess step requires o(n log n); where n is size of the dataset. This preprocess step aims to get initial portioning of data without determining the number of clusters in advance, then computes the means of initial clusters. After that we apply k-means on original data using the resulting information from the preprocess step to get the final clusters. We use many benchmark datasets to test the proposed method. The experimental results show the efficiency of the proposed method.

Publisher

MECS Publisher

Subject

Artificial Intelligence,Control and Optimization,Computer Networks and Communications,Computer Science Applications,Human-Computer Interaction,Modeling and Simulation,Signal Processing

Link

http://www.mecs-press.org/ijisa/ijisa-v12-n6/IJISA-V12-N6-1.pdf

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. K-means clustering hybridized with nature inspired optimization algorithm: A review;AIP Conference Proceedings;2024

2. A detection method for false data injection attacks in power systems based on artificial fish swarm K-means clustering algorithm;2023 9th International Conference on Big Data and Information Analytics (BigDIA);2023-12-15

3. Analysis of communities of countries with similar dynamics of the COVID-19 pandemic evolution;Journal of Dynamics & Games;2022

4. Cluster Analysis of the Loading Time-Series with the Aim of Consistent Durability Estimation;Advances in Artificial Systems for Power Engineering II;2022

5. Analysis of communities of countries with similar dynamics of the COVID-19 pandemic evolution;2021-01-20