Unbalanced Data Processing and Machine Learning in Credit Card Fraud Detection-Reference-Cited by-同舟云学术

Unbalanced Data Processing and Machine Learning in Credit Card Fraud Detection

Published:2022-09-01 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Peng Haili¹,Wang Jing¹

Affiliation:

1. Huaibei Normal University

Abstract

Abstract Imbalanced data often performs poorly in the model and can prevent the model from capturing a few classes of samples, so it is crucial to process imbalanced data. This paper is a credit card fraud detection based on imbalanced data, comparing different processing methods for imbalanced data and using machine learning to detect credit card fraud, and finally arriving at optimal results. Since credit card fraud data is mostly a dichotomous problem and highly imbalanced, it means that the machine learning model favors the majority of the samples and treats the fraudulent transactions in the credit card fraud data as correct transactions. The treatment of imbalanced data is crucial because of the low percentage of fraudulent data. We used different methods for imbalanced data such as oversampling, undersampling, combined sampling and using class weights to improve the class imbalance and applied these methods to credit card fraud detection and calculated Accuracy, Precision, Recall, F1 score and AUC. Because of the severe imbalance in the data, the model is biased towards majority of the samples, so the accuracy of the model will be high. Because we focus more on the probability that a minority class of the sample is correctly classified, to check the performance of the model, we will use the F1 score, the Area Under the Precision-Recall Curve (AUPRC). and recall as measures instead of accuracy. The results demonstrate that the model achieves the best performance by resampling the credit card fraud data, and finding the optimal weights for different category samples by grid search when setting the category weights leads to a significant improvement in the performance of the logistic regression model, and the random forest outperforms all the machine learning models that are compared.

Publisher

Research Square Platform LLC

Reference16 articles.

1. Credit Card Fraud Detection Using State-of-the-Art Machine Learning and Deep Learning Algorithms;Alarfaj SFK;IEEE Access,2022

2. A Novel text2IMG Mechanism of Credit Card Fraud Detection: A Deep Learning Approach;Alharbi A;Electronics,2022

3. Nguyen, V. B., Dastidar, K. G., Granitzer, M. & Siblini, W. The Importance of Future Information in Credit Card Fraud Detection. International Conference on Artificial Intelligence and Statistics 151, 10067–10077 (2022).

4. A survey of machine-learning and nature-inspired based credit card fraud detection techniques;Adewumi AO;International Journal of System Assurance Engineering and Management,2017

5. A Comparative Performance Analysis of Data Resampling Methods on Imbalance Medical Data;Khushi M;IEEE Access,2021

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Predicting dimensional accuracy in 3D printed polydimethylsiloxane‐carbon nanotubes composites via machine learning;Polymer Composites;2023-11-30