Author:
Alamsyah Anas Rulloh Budi,Anisa Salsabila Rahma,Belinda Nadira Sri,Setiawan Adi
Abstract
Unbalanced data are often encountered in practice. They complicate the search for a model suitable for classification. This is because the number of individuals who have a history of a disease is less than the number of individuals who do not. We analyse the IFLS 5 data on medical history of a set of patients. We split the dataset in the proportion 80:20 to training and test subsets. Of course, both datasets are unbalanced, with only a small minority of patients who had a stroke. We apply the SMOTE and Nearmiss methods and evaluate the rate of correct classification. After being treated using the two methods, the training data was transformed into balanced data. The classification process is carried out to test the comparison of the effectiveness of the two methods in solving the problem of unbalanced data. Based on the results obtained, it can be concluded that the Nearmiss method is better than SMOTE in balancing the data. It was obtained by comparing several measures such as accuracy, F-score, Kappa, sensitivity, and specificity on the SMOTE and Nearmiss methods.
Publisher
Politeknik Statistika STIS
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献