Fraud Detection Using Large-scale Imbalance Dataset-Reference-Cited by-同舟云学术

Fraud Detection Using Large-scale Imbalance Dataset

Published:2022-09-16 Issue:08 Volume:31 Page:
ISSN:0218-2130
Container-title:International Journal on Artificial Intelligence Tools
language:en
Short-container-title:Int. J. Artif. Intell. Tools

Author:

Rubaidi Zainab Saad¹²^ORCID,Ammar Boulbaba Ben²^ORCID,Aouicha Mohamed Ben²

Affiliation:

1. College of Agriculture, Al-Muthanna University, Iraq

2. Data Engineering and Semantics Research Unit, Faculty of Sciences, University of Sfax, Tunisia

Abstract

In the context of machine learning, an imbalanced classification problem states to a dataset in which the classes are not evenly distributed. This problem commonly occurs when attempting to classify data in which the distribution of labels or classes is not uniform. Using resampling methods to accumulate samples or entries from the minority class or to drop those from the majority class can be considered the best solution to this problem. The focus of this study is to propose a framework pattern to handle any imbalance dataset for fraud detection. For this purpose, Undersampling (Random and NearMiss) and oversampling (Random, SMOTE, BorderLine SMOTE) were used as resampling techniques for the concentration of our experiments for balancing an evaluated dataset. For the first time, a large-scale unbalanced dataset collected from the Kaggle website was used to test both methods for detecting fraud in the Tunisian company for electricity and gas consumption. It was also evaluated with four machine learning classifiers: Logistic Regression (LR), Naïve Bayes (NB), Random Forest, and XGBoost. Standard evaluation metrics like precision, recall, F1-score, and accuracy have been used to assess the findings. The experimental results clearly revealed that the RF model provided the best performance and outperformed all other matched classifiers with attained a classification accuracy of 89% using NearMiss undersampling and 99% using Random oversampling.

Publisher

World Scientific Pub Co Pte Ltd

Subject

Artificial Intelligence,General Medicine

Link

https://www.worldscientific.com/doi/pdf/10.1142/S0218213022500373

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Leveraging Sampling Schemes on Skewed Class Distribution to Enhance Male Fertility Detection with Ensemble AI Learners;International Journal of Pattern Recognition and Artificial Intelligence;2024-02

2. Machine Learning Models for Early Prediction of COVID-19 Infections Based on Clinical Signs;SN Computer Science;2024-01-06

3. Comparative Data Oversampling Techniques with Deep Learning Algorithms for Credit Card Fraud Detection;Intelligent Systems Design and Applications;2023