SMOTE and Nearmiss Methods for Disease Classification with Unbalanced Data-Reference-Cited by-同舟云学术

SMOTE and Nearmiss Methods for Disease Classification with Unbalanced Data

Published:2022-01-04 Issue:1 Volume:2021 Page:305-314
ISSN:2809-9842
Container-title:Proceedings of The International Conference on Data Science and Official Statistics
language:
Short-container-title:icdsos

Author:

Alamsyah Anas Rulloh Budi,Anisa Salsabila Rahma,Belinda Nadira Sri,Setiawan Adi

Abstract

Unbalanced data are often encountered in practice. They complicate the search for a model suitable for classification. This is because the number of individuals who have a history of a disease is less than the number of individuals who do not. We analyse the IFLS 5 data on medical history of a set of patients. We split the dataset in the proportion 80:20 to training and test subsets. Of course, both datasets are unbalanced, with only a small minority of patients who had a stroke. We apply the SMOTE and Nearmiss methods and evaluate the rate of correct classification. After being treated using the two methods, the training data was transformed into balanced data. The classification process is carried out to test the comparison of the effectiveness of the two methods in solving the problem of unbalanced data. Based on the results obtained, it can be concluded that the Nearmiss method is better than SMOTE in balancing the data. It was obtained by comparing several measures such as accuracy, F-score, Kappa, sensitivity, and specificity on the SMOTE and Nearmiss methods.

Publisher

Politeknik Statistika STIS

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Handling imbalanced medical datasets: review of a decade of research;Artificial Intelligence Review;2024-09-02

2. Unveiling the Potential of Random Undersampling in Geothermal Lithology Classification for Improved Geothermal Resource Exploration;SPE Nigeria Annual International Conference and Exhibition;2024-08-05

3. Prediction of Accident Risk Levels in Traffic Accidents Using Deep Learning and Radial Basis Function Neural Networks Applied to a Dataset with Information on Driving Events;Applied Sciences;2024-07-18

4. Shuffle Split-Edited Nearest Neighbor: A Novel Intelligent Control Model Compression for Smart Lighting in Edge Computing Environment;Smart Innovation, Systems and Technologies;2023