Improved K-means Clustering Algorithm and its Applications-Reference-Cited by-同舟云学术

Improved K-means Clustering Algorithm and its Applications

Published:2019-12-27 Issue:4 Volume:13 Page:403-409
ISSN:1872-2121
Container-title:Recent Patents on Engineering
language:en
Short-container-title:ENG

Author:

Qi Hui¹,Li Jinqing¹,Di Xiaoqiang¹,Ren Weiwu¹,Zhang Fengrong²

Affiliation:

1. School of Computer Science and Technology, Changchun University of Science and Technology, Changchun, China

2. Northeast Normal University, Changchun, China

Abstract

Background: K-means algorithm is implemented through two steps: initialization and subsequent iterations. Initialization is to select the initial cluster center, while subsequent iterations are to continuously change the cluster center until it won't change any more or the number of iterations reaches its maximum. K-means algorithm is so sensitive to the cluster center selected during initialization that the selection of a different initial cluster center will influence the algorithm performance. Therefore, improving the initialization process has become an important means of K-means performance improvement. Methods: This paper uses a new strategy to select the initial cluster center. It first calculates the minimum and maximum values of the data in a certain index (For lower-dimensional data, such as twodimensional data, features with larger variance, or the distance to the origin can be selected; for higher-dimensional data, PCA can be used to select the principal component with the largest variance), and then divides the range into equally-sized sub-ranges. Next adjust the sub-ranges based on the data distribution so that each sub-range contains as much data as possible. Finally, the mean value of the data in each sub-range is calculated and used as the initial clustering center. Results: The theoretical analysis shows that although the time complexity of the initialization process is linear, the algorithm has the characteristics of the superlinear initialization method. This algorithm is applied to two-dimensional GPS data analysis and high-dimensional network attack detection. Experimental results show that this algorithm achieves high clustering performance and clustering speed. Conclusion: This paper reduces the subsequent iterations of K-means algorithm without compromising the clustering performance, which makes it suitable for large-scale data clustering. This algorithm can not only be applied to low-dimensional data clustering, but also suitable for highdimensional data.

Funder

Science and Technology Planning Project of Jilin Province

National Social Science Fund of China

Publisher

Bentham Science Publishers Ltd.

Subject

General Engineering

Reference19 articles.

1. Aloise D.; Deshpande A.; Hansen P.; Popat P.; NP-hardness of Euclidean sum-of-squares clustering. Mach Learn 2009,75,245-248

2. Mahajan M.; Nimbhorkar P.; Varadarajan K.; The planar -means problem is NP-hard. Theor Comput Sci 2012,442,13-21

3. Qi H.; Liu Y.; Wei D.; GPS-Based vehicle moving state recognition method and its applications on dynamic in-car navigation systems In 2014 IEEE 12th International Conference on Dependable, Autonomic and Secure Computing 2014, pp. 354-360

4. Celebi M.E.; Kingravi H.A.; Vela P.A.; A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst Appl 2013,40,200-210

5. Celebi M.E.; Improving the performance of k-means for color quantization. Image Vis Comput 2011,29,260-271

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Active Defense Model of Network Attack on Cloud Platform Based on K-means Algorithm;2023 2nd International Conference on Artificial Intelligence and Autonomous Robot Systems (AIARS);2023-07

2. Analysis of Key Factors Affecting Undergraduate Entrepreneurship Ability from a Big Data Perspective;Wireless Communications and Mobile Computing;2022-01-21

3. A CLUSTERING-BASED APPROACH FOR IDENTIFYING GROUPS OF MUNICIPALITIES TO SUPPORT THE DIRECTION OF PUBLIC SECURITY POLICIES;Pesquisa Operacional;2022

4. Ocean Front Reconstruction Method Based on K-Means Algorithm Iterative Hierarchical Clustering Sound Speed Profile;Journal of Marine Science and Engineering;2021-11-08

5. Applying improved K-means algorithm into official service vehicle networking environment and research;Soft Computing;2020-04-03