Affiliation:
1. Monash University, Clayton, Victoria, Australia
Abstract
This paper proposes an innovative fraud detection method, built upon existing fraud detection research and
Minority Report
, to deal with the data mining problem of skewed data distributions. This method uses backpropagation (BP), together with naive Bayesian (NB) and C4.5 algorithms, on data partitions derived from minority oversampling with replacement. Its originality lies in the use of a single meta-classifier (stacking) to choose the best base classifiers, and then combine these base classifiers' predictions (bagging) to improve cost savings (stacking-bagging). Results from a publicly available automobile insurance fraud detection data set demonstrate that stacking-bagging performs slightly better than the best performing bagged algorithm, C4.5, and its best classifier, C4.5 (2), in terms of cost savings. Stacking-bagging also outperforms the common technique used in industry (BP without both sampling and partitioning). Subsequently, this paper compares the new fraud detection method (meta-learning approach) against C4.5 trained using undersampling, oversampling, and SMOTEing without partitioning (sampling approach). Results show that, given a fixed decision threshold and cost matrix, the partitioning and multiple algorithms approach achieves marginally higher cost savings than varying the entire training data set with different class distributions. The most interesting find is confirming that the combination of classifiers to produce the best cost savings has its contributions from all three algorithms.
Publisher
Association for Computing Machinery (ACM)
Cited by
218 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献