Sampling strategies for handling data imbalance problem: An Extensive Review-Reference-Cited by-同舟云学术

Sampling strategies for handling data imbalance problem: An Extensive Review

Published:2023 Issue:1 Volume:26 Page:177-187
ISSN:0972-0510
Container-title:Journal of Statistics & Management Systems
language:
Short-container-title:JSMS

Author:

Veedhi Bhaskar Kumar,Mishra Debahuti,Das Kaberi

Abstract

The imbalanced data classification is a major issue in data mining. Many researchers have proposed various solutions which addressed imbalanced data problem which is broadly categorized into data level and algorithm level. Class distributions are adjusted in data level method. Creating an algorithm or modifying the existing algorithm is an appropriate approach used in algorithm level method. Imbalanced data classification problem can be resolved by means of Sampling, Random over sampling, Random under sampling, Resampling and by SMOTE (Synthetic Minority Oversampling Techniques). Resampling includes k-means clustering, density-based clustering, neural networks and ensemble. However, no algorithm or a method has an ability to remove bias in data classification, thereby integration of kernel methods with sampling methods or integration of sampling and boosting methods or integration Kernel based with Support Vector Machines (SVM) need to be performed a great extent to get the desired accuracy and performance. The main objective of this paper is to focus on various sampling strategies that are based on sampling and resampling methods and improving the concept of learning within class imbalanced data. It also explains the objectives of the models used by several researchers and emphasized the performance along with the outcomes.

Publisher

Taru Publications

Subject

General Medicine