Parallel and fault-tolerant k-means clustering based on the actor model-Reference-Cited by-同舟云学术

Parallel and fault-tolerant k-means clustering based on the actor model

Published:2020-12-31 Issue:4 Volume:16 Page:379-396
ISSN:1875-9076
Container-title:Multiagent and Grid Systems
language:
Short-container-title:MGS

Author:

Taamneh Salah,Qawasmeh Ahmad,Aljammal Ashraf H.

Abstract

K-means algorithm is a well-known unsupervised machine learning tool that aims at splitting a given dataset into a fixed number of clusters via iterative refinement approach. Running such an algorithm on today’s datasets that are characterized by its high multidimensionality and huge size requires using fault-tolerance mechanisms to mitigate the impact of possible failures. In this paper, we propose an actor-based implementation of k-means algorithm. The algorithm was made fault-tolerant by periodically saving the centroids into a stable storage during the failure-free execution, and restarting from the last saved centroids upon a failure. This was implemented in two different ways: optimistic checkpointing (blocking) and pessimistic checkpointing (non-blocking). The actor-based k-means algorithm was evaluated on a machine with eight cores. The experiments showed that the proposed algorithm scales very well as the number of workers increases, and can be up to ∼ 2x faster than a Java-thread-based implementation of k-means algorithm. The results also showed that the optimistic algorithm outperformed the pessimistic one, specifically, in the presence of competing I/O operations. Several failures were forced to occur during the execution to evaluate the performance of the fault-tolerant implementations. The experiments showed that the average amount of lost work ranged from 3–6%.

Publisher

IOS Press

Subject

General Computer Science

Reference31 articles.

1. W. Zhao, H. Ma and Q. He, Parallel k-means clustering based on mapreduce, in: IEEE International Conference on Cloud Computing, Springer, Berlin, Heidelberg, 2009, pp. 674–679.

2. K. Stoffel and A. Belkoniene, Parallel k/h-means clustering for large data sets, in: European Conference on Parallel Processing, Springer, Berlin, Heidelberg, 1999, pp. 1451–1454.

3. Parallel k-means clustering algorithm on NOWs;Kantabutra;NECTEC Technical Journal,2000

4. Z. Lv, Y. Hu, H. Zhong, J. Wu, B. Li and H. Zhao, Parallel k-means clustering of remote sensing images based on mapreduce, in: International Conference on Web Information Systems and Mining, Springer, Berlin, Heidelberg, 2010, pp. 162–170.

5. The study of parallel k-means algorithm;Zhang;2006 6th World Congress on Intelligent Control and Automation,2006

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Improving Performance Estimation of Smart City Simulations Using the Actor Model;Anais da XV Escola Regional de Alto Desempenho de São Paulo (ERAD-SP 2024);2024-05-16

2. Parallel power load abnormalities detection using fast density peak clustering with a hybrid canopy-K-means algorithm;Intelligent Data Analysis;2024-01-25

3. A Robust Distributed Clustering of Large Data Sets on a Grid of Commodity Machines;Data;2021-07-07