Abstract
We use a rich set of transaction data from a large retailer in India and a dataset on bribe payments to train random forest and XGBoost models using empirical measures guided by Benford’s Law, a commonly used tool in forensic analytics. We evaluate the performance around the 2016 Indian Demonetization, which affects the distribution of legal tender notes in India, and find that models using only pre-2016 data or post-2016 data for both training and testing data had F1 score ranges around 90%, suggesting that these models and Benford’s law criteria contain meaningful information for detecting bribe payments. However, the performance for models trained in one regime and tested in another falls dramatically to less than 10%, highlighting the role of the institutional setting when using financial data analytics in an environment subject to regime shifts.
Funder
Singapore Ministry of Education
Publisher
Public Library of Science (PLoS)
Reference20 articles.
1. I’ve Got Your Number;Mark J. Nigrini;Journal of Accountancy May,1999
2. (2014) The leading digit distribution of the worldwide illicit financial flows;TA Mir;Qual Quant,2016
3. Quick Anomaly Detection by the Newcomb–Benford Law, with Applications to Electoral Processes Data from the USA, Puerto Rico and Venezuela;Luis Pericchi;Statistical Science,2011
4. Using Benford’s Law To Detect Data Error And Fraud: An Examination Of Companies Listed On The Johannesburg Stock Exchange;AD Saville;SAJEMS,2006
5. Data Diagnostics Using Second-Order Tests of Benford’s Law;Mark J. Nigrini;Auditing: A Journal Of Practice & Theory,2009