SMOTE vs. KNNOR: An evaluation of oversampling techniques in machine learning-Reference-Cited by-同舟云学术

SMOTE vs. KNNOR: An evaluation of oversampling techniques in machine learning

Published:2023-06-23 Issue: Volume: Page:
ISSN:2146-538X
Container-title:Gümüşhane Üniversitesi Fen Bilimleri Enstitüsü Dergisi
language:tr
Short-container-title:

Author:

ABACI İsmet¹^ORCID,YILDIZ Kazım²^ORCID

Affiliation:

1. MARMARA ÜNİVERSİTESİ, FEN BİLİMLERİ ENSTİTÜSÜ

2. MARMARA ÜNİVERSİTESİ, TEKNOLOJİ FAKÜLTESİ, BİLGİSAYAR MÜHENDİSLİĞİ BÖLÜMÜ

Abstract

The increasing availability of big data has led to the development of applications that make human life easier. In order to process this data correctly, it is necessary to extract useful and valid information from large data warehouses through a knowledge discovery process in databases (KDD). Data mining is an important part of this and it involves discovering data and developing models that extract unknown patterns. The quality of the data used in supervised machine learning algorithms plays a significant role in determining the success of predictions. One factor that improves the quality of data is a balanced dataset, where the input values are distributed close to each other. However, in practice, many datasets are unbalanced. To overcome this problem, oversampling techniques are used to generate synthetic data that is as close to real data as possible. In this study, we compared the performance of two oversampling techniques, SMOTE and KNNOR, on a variety of datasets using different machine learning algorithms. Our results showed that the use of SMOTE and KNNOR did not always improve the accuracy of the model. In fact, on many datasets, these techniques resulted in a decrease in accuracy. However, on certain datasets, both SMOTE and KNNOR were able to increase the accuracy of the model. Our results indicate that the effectiveness of oversampling techniques varies depending on the specific dataset and machine learning algorithm being used. Therefore, it is crucial to assess the effectiveness of these methods on a case-by-case basis to determine the best approach for a given dataset and algorithm.

Publisher

Gumushane University Journal of Science and Technology Institute

Subject

General Engineering

Reference28 articles.

1. Adekitan, A. I., & Salau, O. P. (2019). The impact of engineering students’ performance in the first three years on their graduation result using educational data mining. Heliyon, 5(2), e01250. https://doi.org/10.1016/j.heliyon.2019.e01250

2. Ashwin Srinivasan (1988). UCI Machine Learning Repository. Retrieved from http://archive.ics.uci.edu/ml

3. Asif, R., Merceron, A., & Pathan, M. K. (2014). Predicting Student Academic Performance at Degree Level: A Case Study. International Journal of Intelligent Systems and Applications, 7(1), 49–61. https://doi.org/10.5815/ijisa.2015.01.05

4. Balcı, M. A., Taşdemir, Ş., Ozmen, G., & Golcuk, A. (2022). Machine Learning-Based Detection of Sleep-Disordered Breathing Type Using Time and Time-Frequency Features. Biomedical Signal Processing and Control, 73, 103402. https://doi.org/10.1016/j.bspc.2021.103402

5. Yasar, A. (11 2022). Benchmarking analysis of CNN models for bread wheat varieties. European Food Research and Technology, 249. doi:10.1007/s00217-022-04172-y

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Gebelikte Anne Sağlığı Risk Gruplarının Tahminine Yönelik Makine Öğrenmesi Tabanlı Bir Karar Destek Sistem Tasarımı;Black Sea Journal of Engineering and Science;2024-05-15

2. Stacking ensemble based hyperparameters to diagnosing of heart disease: Future works;Results in Engineering;2024-03