An in-depth performance analysis of the oversampling techniques for high-class imbalanced dataset-Reference-Cited by-同舟云学术

An in-depth performance analysis of the oversampling techniques for high-class imbalanced dataset

Published:2021-02-28 Issue:1 Volume:7 Page:63
ISSN:2502-3357
Container-title:Register: Jurnal Ilmiah Teknologi Sistem Informasi
language:
Short-container-title:regist. j. ilm. teknol. sist. inf.

Author:

Wibowo Prasetyo,Fatichah Chastine

Abstract

Class imbalance occurs when the distribution of classes between the majority and the minority classes is not the same. The data on imbalanced classes may vary from mild to severe. The effect of high-class imbalance may affect the overall classification accuracy since the model is most likely to predict most of the data that fall within the majority class. Such a model will give biased results, and the performance predictions for the minority class often have no impact on the model. The use of the oversampling technique is one way to deal with high-class imbalance, but only a few are used to solve data imbalance. This study aims for an in-depth performance analysis of the oversampling techniques to address the high-class imbalance problem. The addition of the oversampling technique will balance each class’s data to provide unbiased evaluation results in modeling. We compared the performance of Random Oversampling (ROS), ADASYN, SMOTE, and Borderline-SMOTE techniques. All oversampling techniques will be combined with machine learning methods such as Random Forest, Logistic Regression, and k-Nearest Neighbor (KNN). The test results show that Random Forest with Borderline-SMOTE gives the best value with an accuracy value of 0.9997, 0.9474 precision, 0.8571 recall, 0.9000 F1-score, 0.9388 ROC-AUC, and 0.8581 PRAUC of the overall oversampling technique.

Publisher

Universitas Pesantren Tinggi Darul Ulum (Unipdu)

Subject

Decision Sciences (miscellaneous),Artificial Intelligence,Information Systems and Management,Information Systems,Computer Science (miscellaneous)

Link

https://www.journal.unipdu.ac.id/index.php/register/article/viewFile/2206/pdf

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Credit Card Fraud Detection Using Supervised Learning Algorithms;2024 28th International Conference on Information Technology (IT);2024-02-21

2. Enhanced Arrhythmia Classification through Feature Engineering and Hierarchical Modeling;2023 IEEE International Conference on Big Data (BigData);2023-12-15

3. Analyzing Oversampling and Machine Learning Approaches for Imbalanced Dataset Classification;2023 IEEE 21st Student Conference on Research and Development (SCOReD);2023-12-13

4. Text Data Augmentation Techniques for Fake News Detection in the Romanian Language;Applied Sciences;2023-06-21

5. A Machine Learning-Based Framework for Detecting Credit Card Anomalies and Fraud;2023 27th International Conference on Information Technology (IT);2023-02-15