The effect of feature extraction and data sampling on credit card fraud detection-Reference-Cited by-同舟云学术

The effect of feature extraction and data sampling on credit card fraud detection

Published:2023-01-17 Issue:1 Volume:10 Page:
ISSN:2196-1115
Container-title:Journal of Big Data
language:en
Short-container-title:J Big Data

Author:

Salekshahrezaee Zahra,Leevy Joffrey L.,Khoshgoftaar Taghi M.

Abstract

AbstractTraining a machine learning algorithm on a class-imbalanced dataset can be a difficult task, a process that could prove even more challenging under conditions of high dimensionality. Feature extraction and data sampling are among the most popular preprocessing techniques. Feature extraction is used to derive a richer set of reduced dataset features, while data sampling is used to mitigate class imbalance. In this paper, we investigate these two preprocessing techniques, using a credit card fraud dataset and four ensemble classifiers (Random Forest, CatBoost, LightGBM, and XGBoost). Within the context of feature extraction, the Principal Component Analysis (PCA) and Convolutional Autoencoder (CAE) methods are evaluated. With regard to data sampling, the Random Undersampling (RUS), Synthetic Minority Oversampling Technique (SMOTE), and SMOTE Tomek methods are evaluated. The F1 score and Area Under the Receiver Operating Characteristic Curve (AUC) metrics serve as measures of classification performance. Our results show that the implementation of the RUS method followed by the CAE method leads to the best performance for credit card fraud detection.

Publisher

Springer Science and Business Media LLC

Subject

Information Systems and Management,Computer Networks and Communications,Hardware and Architecture,Information Systems

Link

https://link.springer.com/content/pdf/10.1186/s40537-023-00684-w.pdf

Reference48 articles.

1. Liu B, Tsoumakas G. Dealing with class imbalance in classifier chains via random undersampling. Knowl-Based Syst. 2020;192: 105292.

2. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. Smote: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321–57.

3. Jonathan B, Putra PH, Ruldeviyani Y. Observation imbalanced data text to predict users selling products on female daily with smote, tomek, and smote-tomek. In: 2020 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT), IEEE. pp. 81–85; 2020.

4. Thai-Nghe N, Gantner Z, Schmidt-Thieme L. Cost-sensitive learning methods for imbalanced data. In: The 2010 International Joint Conference on Neural Networks (IJCNN), IEEE. pp. 1–8; 2010.

5. Tomek I, et al. Two modifications of cnn. IEEE Trans Syst Man Cybern. 1976;11:769–72.

Cited by 25 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Predictive modeling and insight into protein fouling in microfiltration and ultrafiltration through one-dimensional convolutional models;Separation and Purification Technology;2025-01

2. Digital twin for credit card fraud detection: opportunities, challenges, and fraud detection advancements;Future Generation Computer Systems;2024-09

3. Fraud Detection Based on Credit Review Texts with Dual Channel Memory Networks;Applied Artificial Intelligence;2024-08-27

4. Generative AI in Network Security and Intrusion Detection;Advances in Information Security, Privacy, and Ethics;2024-07-26

5. A Combinatorial Predictive Method for Fraud Identification to Uphold Security and Data Integrity;Advances in Business Information Systems and Analytics;2024-06-28