Abstract
AbstractOnline supervised learning from fast-evolving data streams, particularly in domains such as health, the environment, and manufacturing, is a crucial research area. However, these domains often experience class imbalance, which can skew class distributions. It is essential for online learning algorithms to analyze large datasets in real-time while accurately modeling rare or infrequent classes that may appear in bursts. While methods have been proposed to handle binary class imbalance, there is a lack of attention to multi-class imbalanced settings with varying degrees of imbalance in evolving streams. In this paper, we present the Dynamic Queues (DynaQ) algorithm for online learning in multi-class imbalanced settings to fill this knowledge gap. Our approach utilizes a batch-based resampling method that creates an instance queue for each class to balance the number of instances. We maintain a queue threshold and remove older samples during training. Additionally, we dynamically oversample minority classes based on one of four rate parameters: recall, F1-score, $$\kappa _m$$
κ
m
, and Euclidean distance. Our learning algorithm consists of an ensemble that uses sliding windows and a soft voting schema while incorporating a drift detection mechanism. Our experimental results demonstrate the superiority of the DynaQ approach over state-of-the-art methods.
Publisher
Springer Science and Business Media LLC
Reference56 articles.
1. Aguiar G, Krawczyk B, Cano A (2022) A survey on learning from imbalanced data streams: taxonomy, challenges, empirical study, and reproducible experimental framework. arXiv:2204.03719
2. Alcalá-Fdez J, Fernández A, Luengo J et al (2011) Keel data mining software tool: Data set repository, integration of algorithms and experimental analysis framework. J Mult-Valued Log Soft Comput 17(2–3):255–287
3. Alliance Canada Compute (2022) Available Resources. last access on November 2022 https://alliancecan.ca
4. Aminian E, Ribeiro RP, Gama J (2021) Chebyshev approaches for imbalanced data streams regression models. Data Min Knowl Discov 35:2389–2466
5. Bernardo A, Della Valle E (2021) Smote-ob: Combining smote and online bagging for continuous rebalancing of evolving data streams. In: 2021 IEEE International Conference on Big Data (Big Data). IEEE, p 5033–5042
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献