Learning from imbalanced data sets with boosting and data generation-Reference-Cited by-同舟云学术

Learning from imbalanced data sets with boosting and data generation

Published:2004-06 Issue:1 Volume:6 Page:30-39
ISSN:1931-0145
Container-title:ACM SIGKDD Explorations Newsletter
language:en
Short-container-title:SIGKDD Explor. Newsl.

Author:

Guo Hongyu¹,Viktor Herna L.¹

Affiliation:

1. University of Ottawa, Ottawa, Ontario, Canada

Abstract

Learning from imbalanced data sets, where the number of examples of one (majority) class is much higher than the others, presents an important challenge to the machine learning community. Traditional machine learning algorithms may be biased towards the majority class, thus producing poor predictive accuracy over the minority class. In this paper, we describe a new approach that combines boosting, an ensemble-based learning algorithm, with data generation to improve the predictive power of classifiers against imbalanced data sets consisting of two classes. In the DataBoost-IM method, hard examples from both the majority and minority classes are identified during execution of the boosting algorithm. Subsequently, the hard examples are used to separately generate synthetic examples for the majority and minority classes. The synthetic data are then added to the original training set, and the class distribution and the total weights of the different classes in the new training set are rebalanced. The DataBoost-IM method was evaluated, in terms of the F-measures, G-mean and overall accuracy , against seventeen highly and moderately imbalanced data sets using decision trees as base classifiers. Our results are promising and show that the DataBoost-IM method compares well in comparison with a base classifier, a standard benchmarking boosting algorithm and three advanced boosting-based algorithms for imbalanced data set. Results indicate that our approach does not sacrifice one class in favor of the other, but produces high predictions against both minority and majority classes.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/1007730.1007736

Reference22 articles.

1. SMOTEBoost: Improving Prediction of the Minority Class in Boosting

Cited by 336 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Improved Contraction-Expansion Subspace Ensemble for High-Dimensional Imbalanced Data Classification;IEEE Transactions on Knowledge and Data Engineering;2024-10

2. Machine learning models based on bubble analysis for Bitcoin market crash prediction;Engineering Applications of Artificial Intelligence;2024-09

3. Explainable domain adaptation for imbalanced occupancy estimation;Journal of Building Engineering;2024-09

4. Fairness in machine learning: definition, testing, debugging, and application;Science China Information Sciences;2024-08-15

5. Size-biased Hybrid Model for Software Defect Prediction;OPSEARCH;2024-08-12