Affiliation:
1. College of Intelligent Information Engineering, Chongqing Aerospace Polytechnic, Chongqing 400021, P. R. China
Abstract
The classification is usually degraded due to the imbalanced class distribution. Synthetic minority oversampling technique (SMOTE) has been successful in improving imbalanced classification and has received great praise. Overgeneralization is one of the most challenges in SMOTE. Although multiple SMOTE-based variations are proposed against overgeneralization, they still have the following shortcomings: (a) creating too many synthetic samples in high-density regions; (b) removing suspicious noise directly instead of modifying them; (c) relying on many parameters. This paper proposes a new SMOTE based on adaptive noise optimization and fast search for local sets (SMOTE-ANO-FLS) to overcome the overgeneralization and the shortcomings of existing works. First, SMOTE-ANO-FLS uses the [Formula: see text]-D tree to fast search the local sets for each sample. Second, a new noise detection method based on local sets and the imbalanced ratio is proposed to detect suspicious noise. Third, a new adaptive noise optimization method is proposed to modify detected suspicious noise instead of removing them. Finally, a new probability weight based on local sets is proposed to help create more synthetic minority class samples in borderline and sparse regions. The effectiveness of SMOTE-ANO-FLS is proven by employing 7 oversampling methods and random forest on the extensive synthetic and real data sets.
Funder
Youth Project of Science and Technology Research Program of Chongqing Education Commission of China
Publisher
World Scientific Pub Co Pte Ltd
Subject
Artificial Intelligence,Computer Vision and Pattern Recognition,Software
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献