Leveraging augmentation techniques for tasks with unbalancedness within the financial domain: a two-level ensemble approach-Reference-Cited by-同舟云学术

Leveraging augmentation techniques for tasks with unbalancedness within the financial domain: a two-level ensemble approach

Published:2023-07-10 Issue:1 Volume:12 Page:
ISSN:2193-1127
Container-title:EPJ Data Science
language:en
Short-container-title:EPJ Data Sci.

Author:

Ranjbaran Golshid,Reforgiato Recupero Diego,Lombardo Gianfranco,Consoli Sergio^ORCID

Abstract

AbstractModern financial markets produce massive datasets that need to be analysed using new modelling techniques like those from (deep) Machine Learning and Artificial Intelligence. The common goal of these techniques is to forecast the behaviour of the market, which can be translated into various classification tasks, such as, for instance, predicting the likelihood of companies’ bankruptcy or in fraud detection systems. However, it is often the case that real-world financial data are unbalanced, meaning that the classes’ distribution is not equally represented in such datasets. This gives the main issue since any Machine Learning model is trained according to the majority class mainly, leading to inaccurate predictions. In this paper, we explore different data augmentation techniques to deal with very unbalanced financial data. We consider a number of publicly available datasets, then apply state-of-the-art augmentation strategies to them, and finally evaluate the results for several Machine Learning models trained on the sampled data. The performance of the various approaches is evaluated according to their accuracy, micro, and macro F1 score, and finally by analyzing the precision and recall over the minority class. We show that a consistent and accurate improvement is achieved when data augmentation is employed. The obtained classification results look promising and indicate the efficiency of augmentation strategies on financial tasks. On the basis of these results, we present an approach focused on classification tasks within the financial domain that takes a dataset as input, identifies what kind of augmentation technique to use, and then applies an ensemble of all the augmentation techniques of the identified type to the input dataset along with an ensemble of different methods to tackle the underlying classification.

Publisher

Springer Science and Business Media LLC

Subject

Computational Mathematics,Computer Science Applications,Modeling and Simulation

Link

https://link.springer.com/content/pdf/10.1140/epjds/s13688-023-00402-9.pdf

Reference84 articles.

1. Agrawal A, Viktor HL, Paquet E (2015) Scut: multi-class imbalanced data classification using smote and cluster-based undersampling. In: The international joint conference on knowledge discovery, knowledge engineering and knowledge management (IC3k), vol 1. IEEE, New York, pp 226–234

2. Ahmad H, Kasasbeh B, Aldabaybah B, Rawashdeh E (2023) Class balancing framework for credit card fraud detection based on clustering and similarity-based selection (SBS). Int J Inf Technol 15(1):325–333

3. Alarfaj FK, Malik I, Khan HU, Almusallam N, Ramzan M, Ahmed M (2022) Credit card fraud detection using state-of-the-art machine learning and deep learning algorithms. IEEE Access 10:39700–39715

4. Alfaiz NS, Fati SM (2022) Enhanced credit card fraud detection model using machine learning. Electronics 11(4):662

5. Awad M, Khanna R (2015) Support vector machines for classification. In: Efficient learning machines. Springer, Berlin, pp 39–66

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Machine learning techniques in bankruptcy prediction: A systematic literature review;Expert Systems with Applications;2024-12

2. Generative Adversarial Networks for Synthetic Data Generation in Finance: Evaluating Statistical Similarities and Quality Assessment;AI;2024-05-13