Affiliation:
1. Department of Statistics, School of Economics, Hangzhou Dianzi University, Hangzhou 310018, China
Abstract
This paper aims to solve the asymmetric problem of sample classification recognition in extreme class imbalance. Inspired by Krawczyk (2016)’s improvement direction of extreme sample imbalance classification, this paper adopts the AdaBoost model framework to optimize the sample weight update function in each iteration. This weight update not only takes into account the sampling weights of misclassified samples, but also pays more attention to the classification effect of misclassified minority sample classes. Thus, it makes the model more adaptable to imbalanced sample class distribution and the situation of extreme imbalance and make the weight adjustment in hard classification samples more adaptive as well as to generate a symmetry between the minority and majority samples in the imbalanced datasets by adjusting class distribution of the datasets. Based on this, the imbalance boosting model, the Imbalance AdaBoost (ImAdaBoost) model is constructed. In the experimental design stage, ImAdaBoost model is compared with the original model and the mainstream imbalance classification model based on imbalanced datasets with different ratio, including extreme imbalanced dataset. The results show that the ImAdaBoost model has good minority class recognition recall ability in the weakly extreme and general class imbalance sets. In addition, the average recall rate of minority class of the mainstream imbalance classification models is 7% lower than that of ImAdaBoost model in the weakly extreme imbalance set. The ImAdaBoost model ensures that the recall rate of the minority class is at the middle level of the comparison model, and the F1-score comprehensive index performs well, demonstrating the strong stability of the minority class classification in extreme imbalanced dataset.
Subject
Physics and Astronomy (miscellaneous),General Mathematics,Chemistry (miscellaneous),Computer Science (miscellaneous)
Reference29 articles.
1. Symmetry and Asymmetry Level Measures;Garrido;Symmetry,2010
2. Bejjanki, K.K., Gyani, J., and Gugulothu, N. (2020). Class Imbalance Reduction (CIR): A Novel Approach to Software Defect Prediction in the Presence of Class Imbalance. Symmetry, 12.
3. Zhang, H., and Liu, Q. (2019). Online Learning Method for Drift and Imbalance Problem in Client Credit Assessment. Symmetry, 11.
4. Li, D.C., Chen, S.C., Lin, Y.S., and Hsu, W.Y. (2022). A Novel Classification Method Based on a Two-Phase Technique for Learning Imbalanced Text Data. Symmetry, 14.
5. A Sampling Method of Imbalanced Data Based on Sample Space;Zhang;Zidonghua Xuebao/Acta Autom. Sin.,2022