Optimization Algorithms for Scalable Stream Batch Clustering with k Estimation-Reference-Cited by-同舟云学术

Optimization Algorithms for Scalable Stream Batch Clustering with k Estimation

Published:2022-06-25 Issue:13 Volume:12 Page:6464
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Cândido Paulo Gustavo Lopes^ORCID,Silva Jonathan Andrade^ORCID,Faria Elaine Ribeiro^ORCID,Naldi Murilo Coelho^ORCID

Abstract

The increasing volume and velocity of the continuously generated data (data stream) challenge machine learning algorithms, which must evolve to fit real-world problems. The data stream clustering algorithms face issues such as the rapidly increasing volume of the data, the variety of the number of clusters, and their shapes. The present work aims to improve the accuracy of sequential clustering batches of data streams for scenarios in which clusters evolve dynamically and continuously, automatically estimating their number. In order to achieve this goal, three evolutionary algorithms are presented, along with three novel algorithms designed to deal with clusters of normal distribution based on goodness-of-fit tests in the context of scalable batch stream clustering with automatic estimation of the number of clusters. All of them are developed on top of MapReduce, Discretized-Stream models, and the most recent MPC frameworks to provide scalability, reliability, resilience, and flexibility. The proposed algorithms are experimentally compared with state-of-the-art methods and present the best results for accuracy for normally distributed data sets, reaching their goal.

Funder

São Paulo Research Foundation

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/12/13/6464/pdf

Reference36 articles.

1. Knowledge Discovery from Data Streams;Gama,2010

2. Machine learning for streaming data

3. Data stream clustering

4. Clustering in the Presence of Concept Drift;Moulton;Proceedings of the Machine Learning and Knowledge Discovery in Databases

5. Comparison Among Methods for k Estimation in k-means;Naldi;Proceedings of the Ninth International Conference on Intelligent Systems Design and Applications,2009

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Efficient Online Stream Clustering Based on Fast Peeling of Boundary Micro-Cluster;IEEE Transactions on Neural Networks and Learning Systems;2024