Clustering Based Undersampling for Handling Class Imbalance in C4.5 Classification Algorithm-Reference-Cited by-同舟云学术

Clustering Based Undersampling for Handling Class Imbalance in C4.5 Classification Algorithm

Published:2020-11-01 Issue:1 Volume:1641 Page:012014
ISSN:1742-6588
Container-title:Journal of Physics: Conference Series
language:
Short-container-title:J. Phys.: Conf. Ser.

Author:

Nugraha Wahyu,Maulana Muhammad Sony,Sasongko Agung

Abstract

Abstract Machine Learning is very difficult to make an effective learning model if the distribution of classes in the training data set that is used is not balanced. The problem of class imbalance is mostly found during classifications in the real world where one class is very small in number (minority class) while the other classes are very numerous (majority in class). Building a learning algorithm model without considering the problem of class imbalance causes the learning model to be flooded by majority class instances so that it ignores minority class predictions. Random undersampling and oversampling techniques have been widely used in various studies to overcome class imbalances. In this study using the undersampling strategy with clustering techniques while the classification model uses C4.5. Clustering is used to group data and the undersampling process is performed on each data group. The goal is that sample samples that are useful are not eliminated. Statistical test results from experiments using 10 imbalance datasets from KEEL-repository dan Kaggle dataset with various sample sizes indicate that clustering-based undersampling produces satisfactory performance. Improved performance can be seen from the sensitivity and AUC values that increased significantly.

Publisher

IOP Publishing

Subject

General Physics and Astronomy

Link

https://iopscience.iop.org/article/10.1088/1742-6596/1641/1/012014/pdf

Reference17 articles.

1. Clustering-based undersampling in class-imbalanced data;Lin;Inf. Sci. (Ny).,2017

2. Cost-sensitive boosting for classification of imbalanced data;Sun;Pattern Recognit.,2007

3. A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches;Galar;IEEE Trans. Syst. Man Cybern. Part C Appl. Rev.,2012

4. A novel ensemble method for classifying imbalanced data;Sun;Pattern Recognit.,2015

5. Exploratory under-sampling for class-imbalance learning;Liu,2009

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Evaluating the Role of Data Enrichment Approaches towards Rare Event Analysis in Manufacturing;Sensors;2024-08-02

2. Supervised and unsupervised machine learning approaches using Sentinel data for flood mapping and damage assessment in Mozambique;Remote Sensing Applications: Society and Environment;2023-11

3. Density-Based Clustering to Deal with Highly Imbalanced Data in Multi-Class Problems;Mathematics;2023-09-21

4. Data Balance Optimization of Fraud Classification for E-Commerce Transaction;2022 Seventh International Conference on Informatics and Computing (ICIC);2022-12-08

5. A dual evolutionary bagging for class imbalance learning;Expert Systems with Applications;2022-11