Abstract
One of the fundamental challenges when dealing with medical imaging datasets is class imbalance. Class imbalance happens where an instance in the class of interest is relatively low, when compared to the rest of the data. This study aims to apply oversampling strategies in an attempt to balance the classes and improve classification performance. We evaluated four different classifiers from k-nearest neighbors (k-NN), support vector machine (SVM), multilayer perceptron (MLP) and decision trees (DT) with 73 oversampling strategies. In this work, we used imbalanced learning oversampling techniques to improve classification in datasets that are distinctively sparser and clustered. This work reports the best oversampling and classifier combinations and concludes that the usage of oversampling methods always outperforms no oversampling strategies hence improving the classification results.
Publisher
Public Library of Science (PLoS)
Reference55 articles.
1. Learning from imbalanced data;H He;Ieee T Knowl Data En,2009
2. A comprehensive data level analysis for cancer diagnosis on imbalanced data;S Fotouhi;Journal of biomedical informatics,2019
3. Diabetes incidence in Pima Indians: contributions of obesity and parental diabetes;WC Knowler;American journal of epidemiology,1981
4. FSVM-CIL: fuzzy support vector machines for class imbalance learning;R Batuwita;IEEE Transactions on Fuzzy Systems,2010
5. SMOTE: Synthetic minority over-sampling technique;NV Chawla;J Artif Intell Res,2002
Cited by
16 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献