An automated approach for binary classification on imbalanced data-Reference-Cited by-同舟云学术

An automated approach for binary classification on imbalanced data

Published:2024-01-12 Issue:5 Volume:66 Page:2747-2767
ISSN:0219-1377
Container-title:Knowledge and Information Systems
language:en
Short-container-title:Knowl Inf Syst

Author:

Vieira Pedro Marques,Rodrigues Fátima

Abstract

AbstractImbalanced data are present in various business sectors and must be handled with the proper resampling methods and classification algorithms. To handle imbalanced data, there are numerous resampling and learning method combinations; nonetheless, their effective use necessitates specialised knowledge. In this paper, several approaches, ranging from more accessible to more advanced in the domain of data resampling techniques, will be considered to handle imbalanced data. The application developed delivers recommendations of the most suitable combinations of techniques for a specific dataset by extracting and comparing dataset meta-feature values recorded in a knowledge base. It facilitates effortless classification and automates part of the machine learning pipeline with comparable or better results than state-of-the-art solutions and with a much smaller execution time.

Funder

Instituto Politécnico do Porto

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s10115-023-02046-7.pdf

Reference42 articles.

1. Lango M (2019) Tackling the problem of class imbalance in multi-class sentiment classification: an experimental study. Found Comput Decis Sci 44(2):151–178. https://doi.org/10.2478/fcds-2019-0009

2. Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Prog Artif Intell 5(4):221–232. https://doi.org/10.1007/s13748-016-0094-0

3. Fernández A, García S, Galar M, Prati RC, Krawczyk B, Herrera F (2018) Learning from imbalanced data sets, vol 10. Springer. https://doi.org/10.1007/978-3-319-98074-4

4. Branco P, Torgo L, Ribeiro RP (2016) A survey of predictive modeling on imbalanced domains. ACM Comput Surv (CSUR) 49(2):1–50. https://doi.org/10.1145/2907070

5. Haixiang G, Yijing L, Shang J, Mingyun G, Yuanyue H, Bing G (2017) Learning from class-imbalanced data: review of methods and applications. Expert Syst Appl 73:220–239. https://doi.org/10.1016/j.eswa.2016.12.035