A Survey on Imbalanced Data Handling Techniques for Classification-Reference-Cited by-同舟云学术

A Survey on Imbalanced Data Handling Techniques for Classification

Published:2021-10-07 Issue:10 Volume:9 Page:1341-1347
ISSN:2347-3983
Container-title:International Journal of Emerging Trends in Engineering Research
language:
Short-container-title:IJETER

Author:

Abstract

Classification is a supervised learning task based on categorizing things in groups on the basis of class labels. Algorithms are trained with labeled datasets for accomplishing the task of classification. In the process of classification, datasets plays an important role. If in a dataset, instances of one label/class (majority class) are much more than instances of another label/class (minority class), such that it becomes hard to understand and learn characteristics of minority class for a classifier, such dataset is termed an imbalanced dataset. These types of datasets raise the problem of biased prediction or misclassification in the real world, as models based on such datasets may give very high accuracy during training, but as not familiar with minority class instances, would not be able to predict minority class and thus fails poorly. A survey on various techniques proposed by the researchers for handling imbalanced data has been presented and a comparison of the techniques based on f-measure has been identified and discussed.

Publisher

The World Academy of Research in Science and Engineering

Subject

General Engineering

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Feature group partitioning: an approach for depression severity prediction with class balancing using machine learning algorithms;BMC Medical Research Methodology;2024-06-03

2. Hierarchical Clustering-Based Synthetic Minority Data Generation for Handling Imbalanced Dataset;Proceedings of Congress on Control, Robotics, and Mechatronics;2023-11-10