Statistic Deviation Mode Balancer (SDMB): A novel sampling algorithm for imbalanced data

Author:

Alimoradi Mahmoud1,Daliri Arman2,Zabihimayvan Mahdieh3,Sadeghi Reza4

Affiliation:

1. Shafagh Institute of Higher Education

2. Islamic Azad University

3. Central Connecticut State University

4. Marist College

Abstract

Abstract

Proper grouping in classifier algorithms is a critical element of supervised learning. The first step in this is to have the correct data. Data that has a problem is worse than not having it. One of the biggest problems inherent in natural data is its imbalance. For a classifier algorithm to achieve its best performance, the first step is to fix the problem of data imbalance. To work with real datasets, the first step is to balance the data. The main problem with existing algorithms is to duplicate minority data and generate data that make outlines part of the primary data. The Statistic Deviation Mode Balancer (SDMB) algorithm solves this problem by making samples that adhere to the original data structure. Our proposed algorithm generates data that is very similar to the original data with the help of standard deviation and the amount of minor data mode and moving away from the majority part. Using these two parameters, the SDMB algorithm avoids Outlier data and generates clean data. The output of this algorithm is a balance datasheet that helps classifier algorithms learn the best way from the data. Different classifier algorithms with entirely different methods have been tested to prove this point. First, we balanced the different datasets with our method. Then, with varying classifier algorithms, we compared it with other existing algorithms. This experiment showed that our proposed algorithm is superior to other competitors and can be used in the work process of real datasets.

Publisher

Springer Science and Business Media LLC

Reference68 articles.

1. “Toward integrating feature selection algorithms for classification and clustering | IEEE Journals & Magazine | IEEE Xplore.” Accessed: Aug. 05, 2021. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/1401889/

2. A. Daliri, A. Asghari, H. Azgomi, and M. Alimoradi, “The water optimization algorithm: a novel metaheuristic for solving optimization problems,” Appl. Intell., vol. 52, no. 15, pp. 17990–18029, Dec. 2022, doi: 10.1007/s10489-022-03397-4.

3. F. Deeba, S. K. Mohammed, F. M. Bui, and K. A. Wahid, “Learning from imbalanced data: A comprehensive comparison of classifier performance for bleeding detection in endoscopic video,” in 2016 5th International Conference on Informatics, Electronics and Vision (ICIEV), May 2016, pp. 1006–1009. doi: 10.1109/ICIEV.2016.7760150.

4. M. Alimoradi, M. Zabihimayvan, A. Daliri, R. Sledzik, and R. Sadeghi, “Deep Neural Classification of Darknet Traffic,” in Frontiers in Artificial Intelligence and Applications, A. Cortés, F. Grimaldo, and T. Flaminio, Eds., IOS Press, 2022. doi: 10.3233/FAIA220323.

5. F. Provost and T. Fawcett, “Robust Classification for Imprecise Environments,” Mach. Learn., vol. 42, no. 3, pp. 203–231, Mar. 2001, doi: 10.1023/A:1007601015854.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3