A Highly Adaptive Oversampling Approach to Address the Issue of Data Imbalance-Reference-Cited by-同舟云学术

A Highly Adaptive Oversampling Approach to Address the Issue of Data Imbalance

Published:2022-05-04 Issue:5 Volume:11 Page:73
ISSN:2073-431X
Container-title:Computers
language:en
Short-container-title:Computers

Author:

Szeghalmy Szilvia^ORCID,Fazekas Attila^ORCID

Abstract

Data imbalance is a serious problem in machine learning that can be alleviated at the data level by balancing the class distribution with sampling. In the last decade, several sampling methods have been published to address the shortcomings of the initial ones, such as noise sensitivity and incorrect neighbor selection. Based on the review of the literature, it has become clear to us that the algorithms achieve varying performance on different data sets. In this paper, we present a new oversampler that has been developed based on the key steps and sampling strategies identified by analyzing dozens of existing methods and that can be fitted to various data sets through an optimization process. Experiments were performed on a number of data sets, which show that the proposed method had a similar or better effect on the performance of SVM, DTree, kNN and MLP classifiers compared with other well-known samplers found in the literature. The results were also confirmed by statistical tests.

Funder

European Social Fund

Publisher

MDPI AG

Subject

Computer Networks and Communications,Human-Computer Interaction

Link

https://www.mdpi.com/2073-431X/11/5/73/pdf

Reference57 articles.

1. Learning from Imbalanced Data Sets;Fernández,2018

2. A Heterogeneous Ensemble Learning Framework for Spam Detection in Social Networks with Imbalanced Data

3. A minority oversampling approach for fault detection with heterogeneous imbalanced data

4. A Quadruplet Deep Metric Learning model for imbalanced time-series fault diagnosis

5. Predicting disease risks from highly imbalanced data using random forest

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A comparative study on noise filtering of imbalanced data sets;Knowledge-Based Systems;2024-10

2. A Comparative Study of the Use of Stratified Cross-Validation and Distribution-Balanced Stratified Cross-Validation in Imbalanced Learning;Sensors;2023-02-20