An oversampling method for imbalanced data based on spatial distribution of minority samples SD-KMSMOTE-Reference-Cited by-同舟云学术

An oversampling method for imbalanced data based on spatial distribution of minority samples SD-KMSMOTE

Published:2022-10-07 Issue:1 Volume:12 Page:
ISSN:2045-2322
Container-title:Scientific Reports
language:en
Short-container-title:Sci Rep

Author:

Yang Wensheng,Pan Chengsheng,Zhang Yanyan

Abstract

AbstractWith the rapid expansion of data, the problem of data imbalance has become increasingly prominent in the fields of medical treatment, finance, network, etc. And it is typically solved using the oversampling method. However, most existing oversampling methods randomly sample or sample only for a particular area, which affects the classification results. To solve the above limitations, this study proposes an imbalanced data oversampling method, SD-KMSMOTE, based on the spatial distribution of minority samples. A filter noise pre-treatment is added, the category information of the near-neighbouring samples is considered, and the existing minority class sample noise is removed. These conditions lead to the design of a new sample synthesis method, and the rules for calculating the weight values are constructed on this basis. The spatial distribution of minority class samples is considered comprehensively; they are clustered, and the sub-clusters that contain useful information are assigned larger weight values and more synthetic sample numbers. The experimental results show that the experimental results outperform existing methods in terms of precision, recall, F1 score, G-mean, and area under the curve values when the proposed method is used to expand the imbalanced dataset in the field of medicine and other fields.

Funder

National Nature Science Foundation of China

Jiangsu Innovation & Entrepreneurship Group Talents Plan

Publisher

Springer Science and Business Media LLC

Subject

Multidisciplinary

Link

https://www.nature.com/articles/s41598-022-21046-1.pdf

Reference32 articles.

1. Almhaithawi, D., Jafar, A. & Aljnidi, M. Correction to: Exampledependent costsensitive credit cards fraud detection using SMOTE and Bayes minimum risk. SN Appl. Sci. 2, (2020).

2. Liu, N., Li, X., Qi, E., Xu, M. & Gao, B. A Novel Ensemble Learning Paradigm for Medical Diagnosis with Imbalanced Data. IEEE Access PP, 1–1 (2020).