Abstract
AbstractConventional police databases contain much information on cybercrime, but extracting it remains a practical challenge. This is because these databases rarely contain labels that could be used to automatically retrieve all cybercrime incidents. In this article, we present a supervised machine learning method for extracting cybercrime incidents in calls for police service datasets. Data from the Korean National Police (2020, 9 months, N = 15 million call logs) is used for the demonstration. We combined methods of keyword query selection, minority oversampling, and majority voting techniques to develop a classifier. Three classification techniques, including Naïve Bayes, linear SVM, and kernel SVM, were tested, and the kernel model was chosen to build the final model (accuracy, 93.4%; F1-score, 92.4). We estimate that cybercrime only represents 4.6% of the cases in the selected dataset (excluding traffic-related incidents), but that it can be prevalent with some crime types. We found, for example, that about three quarters (76%) of all fraud incidents have a cyber dimension. We conclude that the cybercrime classification method proposed in this study can support further research on cybercrime and that it offers considerable advantages over manual or keyword-based approaches.
Publisher
Springer Science and Business Media LLC
Reference42 articles.
1. Action Fraud. (2021a). Cyber Crime Trends 2020–2021. National Fraud Intelligence Bureau. Available online at https://data.actionfraud.police.uk/cms/wp-content/uploads/2021/07/CYBER-Dashboard-Assessment-20-21.pdf
2. Action Fraud. (2021b). Fraud Crime Trends 2020–2021. National Fraud Intelligence Bureau. Available online at https://data.actionfraud.police.uk/cms/wp-content/uploads/2021/07/2020-21-Annual-Assessment-Fraud-Crime-Trends.pdf
3. Aizawa, A. (2003). An information-theoretic perspective of TF–IDF measures. Information Processing & Management, 39(1), 45–65.
4. Anzanello, M. J., & Fogliatto, F. S. (2011). Learning curve models and applications: Literature review and research directions. International Journal of Industrial Ergonomics, 41(5), 573–583.
5. Basit, T. (2003). Manual or electronic? The role of coding in qualitative data analysis. Educational Research, 45(2), 143–154.
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献