Big Data Clustering with Kernel k-Means: Resources, Time and Performance-Reference-Cited by-同舟云学术

Big Data Clustering with Kernel k-Means: Resources, Time and Performance

Published:2018-06 Issue:04 Volume:27 Page:1860006
ISSN:0218-2130
Container-title:International Journal on Artificial Intelligence Tools
language:en
Short-container-title:Int. J. Artif. Intell. Tools

Author:

Tsapanos Nikolaos¹,Tefas Anastasios¹,Nikolaidis Nikolaos¹,Pitas Ioannis¹

Affiliation:

1. Department of Informatics, Aristotle University of Thessaloniki, University Campus, Thessaloniki, Box 54 124, Greece

Abstract

Data clustering is an unsupervised learning task that has found many applications in various scientific fields. The goal is to find subgroups of closely related data samples (clusters) in a set of unlabeled data. A classic clustering algorithm is the so-called k-Means. It is very popular, however, it is also unable to handle cases in which the clusters are not linearly separable. Kernel k-Means is a state of the art clustering algorithm, which employs the kernel trick, in order to perform clustering on a higher dimensionality space, thus overcoming the limitations of classic k-Means regarding the non-linear separability of the input data. With respect to the challenges of Big Data research, a field that has established itself in the last few years and involves performing tasks on extremely large amounts of data, several adaptations of the Kernel k-Means have been proposed, each of which has different requirements in processing power and running time, while also incurring different trade-offs in performance. In this paper, we present several issues and techniques involving the usage of Kernel k-Means for Big Data clustering and how the combination of each component in a clustering framework fares in terms of resources, time and performance. We use experimental results, in order to evaluate several combinations and provide a recommendation on how to approach a Big Data clustering problem.

Publisher

World Scientific Pub Co Pte Lt

Subject

Artificial Intelligence,Artificial Intelligence

Link

https://www.worldscientific.com/doi/pdf/10.1142/S0218213018600060

Reference35 articles.

1. Data clustering

2. Optimized Data Fusion for Kernel k-Means Clustering

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. MapReduce framework based big data clustering using fractional integrated sparse fuzzy C means algorithm;IET Image Processing;2020-10