Performance of Parallel K-Means Algorithms in Java-Reference-Cited by-同舟云学术

Performance of Parallel K-Means Algorithms in Java

Published:2022-03-29 Issue:4 Volume:15 Page:117
ISSN:1999-4893
Container-title:Algorithms
language:en
Short-container-title:Algorithms

Author:

Nigro Libero^ORCID

Abstract

K-means is a well-known clustering algorithm often used for its simplicity and potential efficiency. Its properties and limitations have been investigated by many works reported in the literature. K-means, though, suffers from computational problems when dealing with large datasets with many dimensions and great number of clusters. Therefore, many authors have proposed and experimented different techniques for the parallel execution of K-means. This paper describes a novel approach to parallel K-means which, today, is based on commodity multicore machines with shared memory. Two reference implementations in Java are developed and their performances are compared. The first one is structured according to a map/reduce schema that leverages the built-in multi-threaded concurrency automatically provided by Java to parallel streams. The second one, allocated on the available cores, exploits the parallel programming model of the Theatre actor system, which is control-based, totally lock-free, and purposely relies on threads as coarse-grain “programming-in-the-large” units. The experimental results confirm that some good execution performance can be achieved through the implicit and intuitive use of Java concurrency in parallel streams. However, better execution performance can be guaranteed by the modular Theatre implementation which proves more adequate for an exploitation of the computational resources.

Publisher

MDPI AG

Subject

Computational Mathematics,Computational Theory and Mathematics,Numerical Analysis,Theoretical Computer Science

Link

https://www.mdpi.com/1999-4893/15/4/117/pdf

Reference31 articles.

1. Data clustering: 50 years beyond K-means

2. An empirical comparison between stochastic and deterministic centroid initialisation for K-means variations

Cited by 11 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A K-Means Variation Based on Careful Seeding and Constrained Silhouette Coefficients;Lecture Notes in Networks and Systems;2024

2. Modeling and Analysis of Clustering by Medoids Using Uppaal;Lecture Notes in Networks and Systems;2024

3. Improving Clustering Accuracy of K-Means and Random Swap by an Evolutionary Technique Based on Careful Seeding;Algorithms;2023-12-17

4. A Review of Data Mining, Big Data Analytics and Machine Learning Approaches;Journal of Computing and Natural Science;2023-10-05

5. An Efficient Algorithm for Clustering Sets;2023 IEEE/ACM 27th International Symposium on Distributed Simulation and Real Time Applications (DS-RT);2023-10-04