Affiliation:
1. University of São Paulo, São Paulo, Brazil
Abstract
In the task of multi-label classification in data streams, instances arriving in real-time need to be associated with multiple labels simultaneously. Various methods based on the k Nearest Neighbors algorithm have been proposed to address this task. However, these methods face limitations when dealing with imbalanced data streams, a problem that has received limited attention in existing works. To approach this gap, this article introduces the Imbalance-Robust Multi-Label Self-Adjusting kNN (IRMLSAkNN), designed to tackle multi-label imbalanced data streams. IRMLSAkNN’s strength relies on maintaining relevant instances with imbalance labels by using a discarding mechanism that considers the imbalance ratio per label. On the other hand, it evaluates subwindows with an imbalance-aware measure to discard older instances that are lacking performance. We conducted statistical experiments on 32 benchmark data streams, evaluating IRMLSAkNN against eight multi-label classification algorithms using common accuracy-aware and imbalance-aware measures. The obtained results demonstrate that IRMLSAkNN consistently outperforms these algorithms in terms of predictive capacity and time cost across various levels of imbalance.
Funder
CEPID-CeMEAI-Center for Mathematical Sciences Applied to Industry
Publisher
Association for Computing Machinery (ACM)