A new hybrid approach based on genetic algorithm and support vector machine methods for hyperparameter optimization in synthetic minority over-sampling technique (SMOTE)-Reference-Cited by-同舟云学术

A new hybrid approach based on genetic algorithm and support vector machine methods for hyperparameter optimization in synthetic minority over-sampling technique (SMOTE)

Published:2023 Issue:4 Volume:8 Page:9400-9415
ISSN:2473-6988
Container-title:AIMS Mathematics
language:
Short-container-title:MATH

Author:

Akın Pelin

Abstract

<abstract> <p>The crucial problem when applying classification algorithms is unequal classes. An imbalanced dataset problem means, particularly in a two-class dataset, that the group variable of one class is comparatively more dominant than the group variable of the other class. The issue stems from the fact that the majority class dominates the minority class. The synthetic minority over-sampling technique (SMOTE) has been developed to deal with the classification of imbalanced datasets. SMOTE algorithm increases the number of samples by interpolating between the clustered minority samples. The SMOTE algorithm has three critical parameters, "k", "perc.over", and "perc.under". "perc.over" and "perc.under" hyperparameters allow determining the minority and majority class ratios. The "k" parameter is the number of nearest neighbors used to create new minority class instances. Finding the best parameter value in the SMOTE algorithm is complicated. A hybridized version of genetic algorithm (GA) and support vector machine (SVM) approaches was suggested to address this issue for selecting SMOTE algorithm parameters. Three scenarios were created. Scenario 1 shows the evaluation of support vector machine SVM) results without using the SMOTE algorithm. Scenario 2 shows that the SVM was used after applying SMOTE algorithm without the GA algorithm. In the third scenario, the results were analyzed using the SVM algorithm after selecting the SMOTE algorithm's optimization method. This study used two imbalanced datasets, drug use and simulation data. After, the results were compared with model performance metrics. When the model performance metrics results are examined, the results of the third scenario reach the highest performance. As a result of this study, it has been shown that a genetic algorithm can optimize class ratios and k hyperparameters to improve the performance of the SMOTE algorithm.</p> </abstract>

Publisher

American Institute of Mathematical Sciences (AIMS)

Subject

General Mathematics

Reference39 articles.

1. A. Fernández, S. García, F. Herrera, Addressing the classification with imbalanced data: open problems and new challenges on class distribution, In: Lecture Notes in Computer Science, Heidelberg: Springer, 6678 (2011). https://doi.org/10.1007/978-3-642-21219-2_1

2. M. Liuzzi, P. A. Pelizari, C. Geiß, A. Masi, V. Tramutoli, H. Taubenböck, A transferable remote sensing approach to classify building structural types for seismic risk analyses: the case of Val d'Agri area (Italy), Bull. Earthq. Eng., 17 (2019), 4825–4853.

3. D. Devarriya, C. Gulati, V. Mansharamani, A. Sakalle, A. Bhardwaj, Unbalanced breast cancer data classification using novel fitness functions in genetic programming, Expert Syst. Appl., 140 (2020), 112866. https://doi.org/10.1016/j.eswa.2019.112866

4. S. Katoch, S. S. Chauhan, V. Kumar, A review on genetic algorithm: past, present, and future, Multimed. Tools Appl., 80 (2021), 8091–8126. https://doi.org/10.1007/s11042-020-10139-6

5. Y. L. Yuan, J. J. Ren, S. Wang, Z. X. Wang, X. K. Mu, W. Zhao, Alpine skiing optimization: A new bio-inspired optimization algorithm, Adv. Eng. Softw., 170 (2022), 103158 https://doi.org/10.1016/j.advengsoft.2022.103158

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Application of Machine Learning Techniques for Predicting Students’ Acoustic Evaluation in a University Library;Acoustics;2024-07-25

2. A new experimental design to predict carbon dioxide emissions using Boruta feature selection and hybrid support vector regression techniques;International Journal of Global Warming;2024

3. A Comprehensive Study of the Performances of Imbalanced Data Learning Methods with Different Optimization Techniques;Communications in Computer and Information Science;2024

4. A Variable Step Crow Search Algorithm and Its Application in Function Problems;Biomimetics;2023-08-28