Leveraging augmentation techniques for tasks with unbalancedness within the financial domain: a two-level ensemble approach

Author:

Ranjbaran Golshid,Reforgiato Recupero Diego,Lombardo Gianfranco,Consoli SergioORCID

Abstract

AbstractModern financial markets produce massive datasets that need to be analysed using new modelling techniques like those from (deep) Machine Learning and Artificial Intelligence. The common goal of these techniques is to forecast the behaviour of the market, which can be translated into various classification tasks, such as, for instance, predicting the likelihood of companies’ bankruptcy or in fraud detection systems. However, it is often the case that real-world financial data are unbalanced, meaning that the classes’ distribution is not equally represented in such datasets. This gives the main issue since any Machine Learning model is trained according to the majority class mainly, leading to inaccurate predictions. In this paper, we explore different data augmentation techniques to deal with very unbalanced financial data. We consider a number of publicly available datasets, then apply state-of-the-art augmentation strategies to them, and finally evaluate the results for several Machine Learning models trained on the sampled data. The performance of the various approaches is evaluated according to their accuracy, micro, and macro F1 score, and finally by analyzing the precision and recall over the minority class. We show that a consistent and accurate improvement is achieved when data augmentation is employed. The obtained classification results look promising and indicate the efficiency of augmentation strategies on financial tasks. On the basis of these results, we present an approach focused on classification tasks within the financial domain that takes a dataset as input, identifies what kind of augmentation technique to use, and then applies an ensemble of all the augmentation techniques of the identified type to the input dataset along with an ensemble of different methods to tackle the underlying classification.

Publisher

Springer Science and Business Media LLC

Subject

Computational Mathematics,Computer Science Applications,Modeling and Simulation

Reference84 articles.

1. Agrawal A, Viktor HL, Paquet E (2015) Scut: multi-class imbalanced data classification using smote and cluster-based undersampling. In: The international joint conference on knowledge discovery, knowledge engineering and knowledge management (IC3k), vol 1. IEEE, New York, pp 226–234

2. Ahmad H, Kasasbeh B, Aldabaybah B, Rawashdeh E (2023) Class balancing framework for credit card fraud detection based on clustering and similarity-based selection (SBS). Int J Inf Technol 15(1):325–333

3. Alarfaj FK, Malik I, Khan HU, Almusallam N, Ramzan M, Ahmed M (2022) Credit card fraud detection using state-of-the-art machine learning and deep learning algorithms. IEEE Access 10:39700–39715

4. Alfaiz NS, Fati SM (2022) Enhanced credit card fraud detection model using machine learning. Electronics 11(4):662

5. Awad M, Khanna R (2015) Support vector machines for classification. In: Efficient learning machines. Springer, Berlin, pp 39–66

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3