Statistic Deviation Mode Balancer (SDMB): A novel sampling algorithm for imbalanced data
Author:
Alimoradi Mahmoud1, Daliri Arman2, Zabihimayvan Mahdieh3, Sadeghi Reza4
Affiliation:
1. Shafagh Institute of Higher Education 2. Islamic Azad University 3. Central Connecticut State University 4. Marist College
Abstract
Abstract
Proper grouping in classifier algorithms is a critical element of supervised learning. The first step in this is to have the correct data. Data that has a problem is worse than not having it. One of the biggest problems inherent in natural data is its imbalance. For a classifier algorithm to achieve its best performance, the first step is to fix the problem of data imbalance. To work with real datasets, the first step is to balance the data. The main problem with existing algorithms is to duplicate minority data and generate data that make outlines part of the primary data. The Statistic Deviation Mode Balancer (SDMB) algorithm solves this problem by making samples that adhere to the original data structure. Our proposed algorithm generates data that is very similar to the original data with the help of standard deviation and the amount of minor data mode and moving away from the majority part. Using these two parameters, the SDMB algorithm avoids Outlier data and generates clean data. The output of this algorithm is a balance datasheet that helps classifier algorithms learn the best way from the data. Different classifier algorithms with entirely different methods have been tested to prove this point. First, we balanced the different datasets with our method. Then, with varying classifier algorithms, we compared it with other existing algorithms. This experiment showed that our proposed algorithm is superior to other competitors and can be used in the work process of real datasets.
Publisher
Springer Science and Business Media LLC
Reference68 articles.
1. “Toward integrating feature selection algorithms for classification and clustering | IEEE Journals & Magazine | IEEE Xplore.” Accessed: Aug. 05, 2021. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/1401889/ 2. A. Daliri, A. Asghari, H. Azgomi, and M. Alimoradi, “The water optimization algorithm: a novel metaheuristic for solving optimization problems,” Appl. Intell., vol. 52, no. 15, pp. 17990–18029, Dec. 2022, doi: 10.1007/s10489-022-03397-4. 3. F. Deeba, S. K. Mohammed, F. M. Bui, and K. A. Wahid, “Learning from imbalanced data: A comprehensive comparison of classifier performance for bleeding detection in endoscopic video,” in 2016 5th International Conference on Informatics, Electronics and Vision (ICIEV), May 2016, pp. 1006–1009. doi: 10.1109/ICIEV.2016.7760150. 4. M. Alimoradi, M. Zabihimayvan, A. Daliri, R. Sledzik, and R. Sadeghi, “Deep Neural Classification of Darknet Traffic,” in Frontiers in Artificial Intelligence and Applications, A. Cortés, F. Grimaldo, and T. Flaminio, Eds., IOS Press, 2022. doi: 10.3233/FAIA220323. 5. F. Provost and T. Fawcett, “Robust Classification for Imprecise Environments,” Mach. Learn., vol. 42, no. 3, pp. 203–231, Mar. 2001, doi: 10.1023/A:1007601015854.
|
|