Threshold optimization and random undersampling for imbalanced credit card data-Reference-Cited by-同舟云学术

Threshold optimization and random undersampling for imbalanced credit card data

Published:2023-05-06 Issue:1 Volume:10 Page:
ISSN:2196-1115
Container-title:Journal of Big Data
language:en
Short-container-title:J Big Data

Author:

Leevy Joffrey L.,Johnson Justin M.,Hancock John,Khoshgoftaar Taghi M.

Abstract

AbstractOutput thresholding is well-suited for addressing class imbalance, since the technique does not increase dataset size, run the risk of discarding important instances, or modify an existing learner. Through the use of the Credit Card Fraud Detection Dataset, this study proposes a threshold optimization approach that factors in the constraint True Positive Rate (TPR) ≥ True Negative Rate (TNR). Our findings indicate that an increase of the Area Under the Precision–Recall Curve (AUPRC) score is associated with an improvement in threshold-based classification scores, while an increase of positive class prior probability causes optimal thresholds to increase. In addition, we discovered that best overall results for the selection of an optimal threshold are obtained without the use of Random Undersampling (RUS). Furthermore, with the exception of AUPRC, we established that the default threshold yields good performance scores at a balanced class ratio. Our evaluation of four threshold optimization techniques, eight threshold-dependent metrics, and two threshold-agnostic metrics defines the uniqueness of this research.

Publisher

Springer Science and Business Media LLC

Subject

Information Systems and Management,Computer Networks and Communications,Hardware and Architecture,Information Systems

Link

https://link.springer.com/content/pdf/10.1186/s40537-023-00738-z.pdf

Reference24 articles.

1. Leevy JL, Khoshgoftaar TM, Bauder RA, Seliya N. A survey on addressing high-class imbalance in big data. J Big Data. 2018;5(1):42.

2. Kesici M, Saner CB, Yaslan Y, Genc VI. Cost sensitive class-weighting approach for transient instability prediction using convolutional neural networks. In: 2019 11th international conference on electrical and electronics engineering (ELECO). IEEE; 2019. p. 141–5.

3. Johnson JM, Khoshgoftaar TM. Output thresholding for ensemble learners and imbalanced big data. In: 2021 IEEE 33rd international conference on tools with artificial intelligence (ICTAI). IEEE; 2021. p. 1449–54.

4. Hasanin T, Khoshgoftaar TM, Leevy JL, Bauder RA. Severely imbalanced big data challenges: investigating data sampling approaches. J Big Data. 2019;6(1):1–25.

5. Seiffert C, Khoshgoftaar TM, Van Hulse J, Napolitano A. A comparative study of data sampling and cost sensitive learning. In: 2008 IEEE international conference on data mining workshops. IEEE; 2008. p. 46–52.

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. MVQS: Robust multi-view instance-level cost-sensitive learning method for imbalanced data classification;Information Sciences;2024-07

2. Improving Credit Card Fraud Detection with Data Reduction Approaches;International Journal of Reliability, Quality and Safety Engineering;2024-05-15

3. Addressing diversity in hiring procedures: a generative adversarial network approach;AI and Ethics;2024-05-02

4. Synthesizing class labels for highly imbalanced credit card fraud detection data;Journal of Big Data;2024-03-09

5. Comparison of Undersampling Methods for Imbalanced Credit Card Fraud Dataset;2023 3rd International Conference on Advancement in Electronics & Communication Engineering (AECE);2023-11-23