Affiliation:
1. USICT, GGSIPU, New Delhi, India
2. Department of Artificial Intelligence & Data Sciences, IGDTUW, Delhi, India
Abstract
Class imbalance problem (CIP) exists when the class distribution is not uniform. Many real-world scenarios face CIP which attracted the researcher’s attention to this problem. Training machine learning (ML) models with class imbalanced datasets is a challenging problem. Ensemble methods in ML involve training multiple classifiers, combining or averaging their predictions to come to a final prediction. Specifically designed ensemble-based methods can overcome the difficulty faced by traditional classifiers and can handle the CIP. The performance of 19 ensemble methods for 44 unbalanced datasets is assessed in this paper in order to observe the effects of the class imbalance ratio (CIR). For performance evaluation, we divide these datasets into three categories, i.e., Slightly Imbalance (SI), Moderately Imbalance (MI) and Highly Imbalance (HI) based on CIR. With the proposed perspective, we observe that different ensemble methods perform well in different categories suggesting that the percentage of minority or majority class could be a criterion for the selection of ensemble methods for class imbalance datasets. Moreover, visual representations and different non-parametric statistical tests are also used to have more reliable results.
Subject
Artificial Intelligence,General Engineering,Statistics and Probability
Reference31 articles.
1. A study of the behavior of several methods for balancing machine learning training data;Batista;ACM SIGKDD Explorations Newsletter,2004
2. A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches, Part C (Applications and Reviews);Galar;IEEE Transactions on Systems, Man, and Cybernetics,2011
3. Association Rule Mining-Based Dissolved Gas Analysis for Fault Diagnosis of Power Transformers;Yang;In IEEE Transactions on Systems, Man, and Cybernetics,2009
4. Iterative Boolean combination of classifiers in the ROC space: An application to anomaly detection with HMMs;Khreich;Pattern Recognition,2010
5. Kaur P. and Gosain A. , Empirical Assessment of Ensemble based Approaches to Classify Imbalanced Data in Binary Classification, International Journal of Advanced Computer Science and Applications 10(3) (2019).