Author:
Veedhi Bhaskar Kumar,Mishra Debahuti,Das Kaberi
Abstract
The imbalanced data classification is a major issue in data mining. Many researchers have proposed various solutions which addressed imbalanced data problem which is broadly categorized into data level and algorithm level. Class distributions are adjusted in data level method. Creating an algorithm or modifying the existing algorithm is an appropriate approach used in algorithm level method. Imbalanced data classification problem can be resolved by means of Sampling, Random over sampling, Random under sampling, Resampling and by SMOTE (Synthetic Minority Oversampling Techniques). Resampling includes k-means clustering, density-based clustering, neural networks and ensemble. However, no algorithm or a method has an ability to remove bias in data classification, thereby integration of kernel methods with sampling methods or integration of sampling and boosting methods or integration Kernel based with Support Vector Machines (SVM) need to be performed a great extent to get the desired accuracy and performance. The main objective of this paper is to focus on various sampling strategies that are based on sampling and resampling methods and improving the concept of learning within class imbalanced data. It also explains the objectives of the models used by several researchers and emphasized the performance along with the outcomes.