A Parallel Multiobjective PSO Weighted Average Clustering Algorithm Based on Apache Spark

Author:

Ling Huidong1,Zhu Xinmu1,Zhu Tao1,Nie Mingxing1ORCID,Liu Zhenghai1,Liu Zhenyu1

Affiliation:

1. School of Computer Science, University of South China, Hengyang 421200, China

Abstract

Multiobjective clustering algorithm using particle swarm optimization has been applied successfully in some applications. However, existing algorithms are implemented on a single machine and cannot be directly parallelized on a cluster, which makes it difficult for existing algorithms to handle large-scale data. With the development of distributed parallel computing framework, data parallelism was proposed. However, the increase in parallelism will lead to the problem of unbalanced data distribution affecting the clustering effect. In this paper, we propose a parallel multiobjective PSO weighted average clustering algorithm based on apache Spark (Spark-MOPSO-Avg). First, the entire data set is divided into multiple partitions and cached in memory using the distributed parallel and memory-based computing of Apache Spark. The local fitness value of the particle is calculated in parallel according to the data in the partition. After the calculation is completed, only particle information is transmitted, and there is no need to transmit a large number of data objects between each node, reducing the communication of data in the network and thus effectively reducing the algorithm’s running time. Second, a weighted average calculation of the local fitness values is performed to improve the problem of unbalanced data distribution affecting the results. Experimental results show that the Spark-MOPSO-Avg algorithm achieves lower information loss under data parallelism, losing about 1% to 9% accuracy, but can effectively reduce the algorithm time overhead. It shows good execution efficiency and parallel computing capability under the Spark distributed cluster.

Funder

National Natural Science Foundation of China

Natural Science Foundation of Hunan Province

Research Foundation of Education Bureau of Hunan Province

Hengyang Science and Technology Major Project

Publisher

MDPI AG

Subject

General Physics and Astronomy

Reference31 articles.

1. A survey of kernel and spectral methods for clustering;Filippone;Pattern Recognit.,2008

2. Application of k-means and hierarchical clustering techniques for analysis of air pollution: A review (1980–2019);Govender;Atmos. Pollut. Res.,2020

3. Data clustering: A review;Jain;ACM Comput. Surv. (CSUR),1999

4. McDowell, I.C., Manandhar, D., Vockley, C.M., Schmid, A.K., Reddy, T.E., and Engelhardt, B.E. (2018). Clustering gene expression time series data using an infinite Gaussian process mixture model. PLoS Comput. Biol., 14.

5. Chen, C.Y., and Ye, F. (2012, January 2–3). Particle swarm optimization algorithm and its application to clustering analysis. Proceedings of the 2012 17th Conference on Electrical Power Distribution, Tehran, Iran.

Cited by 4 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. A Novel Algorithm for Enhancing Terrain-Aided Navigation in Autonomous Underwater Vehicles;Information;2024-09-02

2. TSKPSO: Spark-Based Multiple Kernel Particle Swarm Optimization Algorithm for Big Data Clustering;2024 MIT Art, Design and Technology School of Computing International Conference (MITADTSoCiCon);2024-04-25

3. Multi-objective Feature Selection Algorithm Based on Apache Spark and Particle Swarm Optimization;2023 IEEE International Conference on Control, Electronics and Computer Technology (ICCECT);2023-04-28

4. A Stable Large-Scale Multiobjective Optimization Algorithm with Two Alternative Optimization Methods;Entropy;2023-03-25

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3