Abstract
This paper proposes a new evolving algorithm named Macro SOStream with entirely online learning and based on self-organizing density for data stream clustering. The Macro SOStream is based on the SOStream algorithm, but we incorporate macroclusters composed of microclusters. While microclusters have spherical shapes, macroclusters can have arbitrary shapes. Moreover, the Macro SOStream has the macrocluster merge functionality specially designed to improve its performance under data drift contexts. The Macro SOStream’s performance is compared to SOStream and DenStream algorithms’ performance using four synthetic datasets and the ARI performance metric to validate our proposal. Furthermore, we carry out an exhaustive analysis on the influence of adequate hyperparameter setup on these algorithms’ performance. As a result, the Macro SOStream presents good performance mainly in the context of data drift and for demands of non-spherical clusters.
Funder
Coordenação de Aperfeicoamento de Pessoal de Nível Superior
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science