Abstract
The k-means is the most well-known algorithm for data clustering in data mining. Its simplicity and speed of convergence to local minima are the most important advantages of it, in addition to its linear time complexity. The most important open problems in this algorithm are the selection of initial centers and the determination of the exact number of clusters in advance. This paper proposes a solution for these two problems together; by adding a preprocess step to get the expected number of clusters in data and better initial centers. There are many researches to solve each of these problems separately, but there is no research to solve both problems together. The preprocess step requires o(n log n); where n is size of the dataset. This preprocess step aims to get initial portioning of data without determining the number of clusters in advance, then computes the means of initial clusters. After that we apply k-means on original data using the resulting information from the preprocess step to get the final clusters. We use many benchmark datasets to test the proposed method. The experimental results show the efficiency of the proposed method.
Subject
Artificial Intelligence,Control and Optimization,Computer Networks and Communications,Computer Science Applications,Human-Computer Interaction,Modeling and Simulation,Signal Processing
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献