An Improved K-means Distributed Clustering Algorithm Based on Spark Parallel Computing Framework-Reference-Cited by-同舟云学术

An Improved K-means Distributed Clustering Algorithm Based on Spark Parallel Computing Framework

Published:2020-08-01 Issue:1 Volume:1616 Page:012065
ISSN:1742-6588
Container-title:Journal of Physics: Conference Series
language:
Short-container-title:J. Phys.: Conf. Ser.

Author:

Lu Xin,Lu Huanghuang,Yuan Jiao,Wang Xun

Abstract

Abstract Traditional K-means distributed clustering algorithm has many problems in clustering big data, such as unstable clustering results, poor clustering results and low execution efficiency. In this paper, a density based initial clustering center selection method is proposed to improve the K-means distributed clustering algorithm. The algorithm uses the sample density, the distance between clusters and the cluster compact density, defines the product of the three as the difference weight density, and finds the sample point with the maximum difference weight density as the initial cluster center, so as to solve the problem of randomness and low quality of initial cluster center selection. At the same time, this paper uses spark parallel computing framework to implement the improved algorithm to further improve the processing performance of the algorithm in big data clustering.The experimental results show that the improved k-means distributed clustering algorithm based on spark parallel computing framework has higher execution efficiency, accuracy and good stability in big data clustering analysis.

Publisher

IOP Publishing

Subject

General Physics and Astronomy

Link

https://iopscience.iop.org/article/10.1088/1742-6596/1616/1/012065/pdf

Reference13 articles.

1. Points of Significance: Clustering;Altman;J. Nature Methods,2017

2. Robust global motion estimation for video security based on improved k-means clustering;Wu;J. Journal of Ambient Intelligence & Humanized Computing,2018

3. Variations on the Clustering Algorithm BIRCH;Lorbeer;J. Big Data Research,2018

4. Short-Term Wind Power Prediction Using GA-BP Neural Network Based on DBSCAN Algorithm Outlier Identification;Zhang;J. Processes,2020

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Blockchain-Based Framework to Resolve the Oligopoly Issue in Cloud Computing;IEEE Transactions on Cloud Computing;2024-04

2. Analysis Model of Efficiency and Accuracy in Big Data Based on Clustering Algorithm K-means;2023 International Conference on Internet of Things, Robotics and Distributed Computing (ICIRDC);2023-12-29

3. Initial Clustering Based on the Swarm Intelligence Algorithm for Computing a Data Density Parameter;Computational Intelligence and Neuroscience;2022-06-10