Multi-class WHMBoost: An ensemble algorithm for multi-class imbalanced data-Reference-Cited by-同舟云学术

Multi-class WHMBoost: An ensemble algorithm for multi-class imbalanced data

Published:2022-04-18 Issue:3 Volume:26 Page:599-614
ISSN:1088-467X
Container-title:Intelligent Data Analysis
language:
Short-container-title:IDA

Author:

Zhao Jiakun,Jin Ju,Zhang Yibo,Zhang Ruifeng,Chen Si

Abstract

The imbalanced data problem is widespread in the real world. In the process of training machine learning models, ignoring imbalanced data problems will cause the performance of the model to deteriorate. At present, researchers have proposed many methods to deal with the imbalanced data problems, but these methods mainly focus on the imbalanced data problems in two-class classification tasks. Learning from multi-class imbalanced data sets is still an open problem. In this paper, an ensemble method for classifying multi-class imbalanced data sets is put forward, called multi-class WHMBoost. It is an extension of WHMBoost that we proposed earlier. We do not use the algorithm used in WHMBoost to process the data, but use random balance based on average size so as to balance the data distribution. The weak classifiers we use in the boosting algorithm are support vector machine and decision tree classifier. In the process of training the model, they participate in training with given weights in order to complement each other’s advantages. On 18 multi-class imbalanced data sets, we compared the performance of multi-class WHMBoost with state of the art ensemble algorithms using MAUC, MG-mean and MMCC as evaluation criteria. The results demonstrate that it has obvious advantages compared with state of the art ensemble algorithms and can effectively deal with multi-class imbalanced data sets.

Publisher

IOS Press

Subject

Artificial Intelligence,Computer Vision and Pattern Recognition,Theoretical Computer Science

Reference31 articles.

1. N. Japkowicz, Learning from Imbalanced Data Sets: A Comparison of Various Strategies *, 2000.

2. Multi-class imbalance in text classification: A feature engineering approach to detect cyberbullying in twitter;Talpur;Informatics,2020

3. C. Arun and C. Lakshmi, Class Imbalance in Software Fault Prediction Data Set, 2020.

4. A study of the behavior of several methods for balancing machine learning training data;Batista;SIGKDD Explor,2004

5. Survey on deep learning with class imbalance;Johnson;Journal of Big Data,2019

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. HSNF: Hybrid sampling with two-step noise filtering for imbalanced data classification;Intelligent Data Analysis;2023-11-20