Abstract
AbstractOversampling is a promising preprocessing technique for imbalanced datasets which generates new minority instances to balance the dataset. However, improper generated minority instances, i.e., noise instances, may interfere the learning of the classifier and impact it negatively. Given this, in this paper, we propose a simple and effective oversampling approach known as ASN-SMOTE based on the k-nearest neighbors and the synthetic minority oversampling technology (SMOTE). ASN-SMOTE first filters noise in the minority class by determining whether the nearest neighbor of each minority instance belongs to the minority or majority class. After that, ASN-SMOTE uses the nearest majority instance of each minority instance to effectively perceive the decision boundary, inside which the qualified minority instances are selected adaptively for each minority instance by the proposed adaptive neighbor selection scheme to synthesize new minority instance. To substantiate the effectiveness, ASN-SMOTE has been applied to three different classifiers and comprehensive experiments have been conducted on 24 imbalanced benchmark datasets. ASN-SMOTE is also extensively compared with nine notable oversampling algorithms. The results show that ASN-SMOTE achieves the best results in the majority of datasets. The ASN-SMOTE implementation is available at: https://www.github.com/yixinkai123/ASN-SMOTE/.
Funder
natural science foundation of zhejiang province
national undergraduate innovation and entrepreneurship training program
Publisher
Springer Science and Business Media LLC
Subject
General Earth and Planetary Sciences,General Environmental Science
Reference60 articles.
1. Alcalá-Fdez, Fernández J, Luengo A, Derrac J, García J, Sánchez S, Herrera L F (2011) Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Mult Valued Logic Soft Computg 17: 255–287
2. Bach M, Werner A, żywiec J, Pluskiewicz W (2017) The study of under- and over-sampling methods’ utility in analysis of highly imbalanced data on osteoporosis. Inf Sci 384:174–190. https://doi.org/10.1016/j.ins.2016.09.038
3. Barua S, Islam MM, Yao X, Murase K (2014) MWMOTE-majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans Knowl Data Eng 26(2):405–425. https://doi.org/10.1109/TKDE.2012.232
4. Batista GEAPA, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor Newsl 6(1):20–29. https://doi.org/10.1145/1007730.1007735
5. Bedi P, Gupta N, Jindal V (2020) I-SiamIDS: an improved Siam-IDS for handling class imbalance in network-based intrusion detection systems. Appl Intell 51:1133–1151 (2021). https://doi.org/10.1007/s10489-020-01886-y
Cited by
23 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献