Author:
Jensen Rasmus Ingemann Tuffveson,Ferwerda Joras,Jørgensen Kristian Sand,Jensen Erik Rathje,Borg Martin,Krogh Morten Persson,Jensen Jonas Brunholm,Iosifidis Alexandros
Abstract
AbstractBank transactions are highly confidential. As a result, there are no real public data sets that can be used to investigate and compare anti-money laundering (AML) methods in banks. This severely limits research on important AML problems such as efficiency, effectiveness, class imbalance, concept drift, and interpretability. To address the issue, we present SynthAML: a synthetic data set to benchmark statistical and machine learning methods for AML. The data set builds on real data from Spar Nord, a systemically important Danish bank, and contains 20,000 AML alerts and over 16 million transactions. Experimental results indicate that performance on SynthAML can be transferred to the real world. As use cases, we present and discuss open problems in the AML literature.
Publisher
Springer Science and Business Media LLC
Subject
Library and Information Sciences,Statistics, Probability and Uncertainty,Computer Science Applications,Education,Information Systems,Statistics and Probability
Reference27 articles.
1. FATF. International standards on combating money laundering and the financing of terrorism & proliferation. https://www.fatf-gafi.org/publications/fatfrecommendations/documents/fatf-recommendations.html. Accessed July 7, 2023 (2023).
2. Jensen, R. I. T. & Iosifidis, A. Fighting money laundering with statistics and machine learning. IEEE Access 11, 8889–8903, https://doi.org/10.1109/ACCESS.2023.3239549 (2023).
3. Rocher, L., Hendrickx, J. & Montjoye, Y.-A. Estimating the success of re-identifications in incomplete datasets using generative models. Nature Communications 10, https://doi.org/10.1038/s41467-019-10933-3 (2019).
4. Barth-Jones, D. The ‘re-identification’ of Governor William Weld’s medical information: a critical re-examination of health data identification risks and privacy protections, then and now. SSRN https://doi.org/10.2139/ssrn.2076397 (2012).
5. Narayanan, A. & Shmatikov, V. Robust de-anonymization of large sparse datasets. In 2008 IEEE Symposium on Security and Privacy, 111–125, https://doi.org/10.1109/SP.2008.33 (2008).
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献