Machine learning-based processing of unbalanced data sets for computer algorithms-Reference-Cited by-同舟云学术

Machine learning-based processing of unbalanced data sets for computer algorithms

Published:2023-01-01 Issue:1 Volume:13 Page:
ISSN:2299-1093
Container-title:Open Computer Science
language:en
Short-container-title:

Author:

Zhou Qingwei¹,Qi Yongjun²³,Tang Hailin²³,Wu Peng¹

Affiliation:

1. School of Information and Engineering, Sichuan Tourism University , Chengdu 610000, Sichuan , China

2. Faculty of Megadata and Computing, Guangdong Baiyun University , Guangzhou 510450 Guangdong , China

3. School of Information and Communication Technology, Mongolian University of Science and Technology, Bayanzurkh District , 13341 , Ulaanbaatar , Mongolia

Abstract

Abstract The rapid development of technology allows people to obtain a large amount of data, which contains important information and various noises. How to obtain useful knowledge from data is the most important thing at this stage of machine learning (ML). The problem of unbalanced classification is currently an important topic in the field of data mining and ML. At present, this problem has attracted more and more attention and is a relatively new challenge for academia and industry. The problem of unbalanced classification involves classifying data when there is insufficient data or severe category distribution deviations. Due to the inherent complexity of unbalanced data sets, more new algorithms and tools are needed to effectively convert a large amount of raw data into useful information and knowledge. Unbalanced data set is a special case of classification problem, in which the distribution between classes is uneven, and it is difficult to classify data accurately. This article mainly introduces the research on the processing method of computer algorithms based on the processing method of unbalanced data sets based on ML, aiming to provide some ideas and directions for the processing of computer algorithms based on unbalanced data sets based on ML. This article proposes a research strategy for processing unbalanced data sets based on ML, including data preprocessing, decision tree data classification algorithm, and C4.5 algorithm, which are used to conduct research experiments on processing methods for unbalanced data sets based on ML. The experimental results in this article show that the accuracy rate of the decision tree C4.5 algorithm based on ML is 94.80%, which can be better used for processing unbalanced data sets based on ML.

Publisher

Walter de Gruyter GmbH

Subject

General Computer Science

Link

https://www.degruyter.com/document/doi/10.1515/comp-2022-0273/pdf

Reference30 articles.

1. A. Vollant, G. Balarac, and C. Corre, “Subgrid-scale scalar flux modelling based on optimal estimation theory and machine-learning procedures,” J. Turbul., vol. 18, no. 9, pp. 1–25, 2017.

2. T. Hunt, C. Song, R. Shokri, V. Shmatikov and E. Witchel, “Privacy-preserving machine learning as a service,” Proc. Priv. Enhancing Technol., vol. 2018, no. 3, pp. 123–142, 2018.

3. Y. Li, H. Li, F. C. Pickard, B. Narayanan, F. Sen, M. K. Y. Chan, et al. “Machine learning force field parameters from Ab initio data,” J. Chem. Theory Comput., vol. 13, no. 9. pp. 4492–4503, 2017.

4. A. Karpatne, Z. Jiang, R. R. Vatsavai, S. Shekhar and V. Kumar, “Monitoring land-cover changes: A machine-learning perspective,” IEEE Geosci. Remote. Sens. Mag., vol. 4, no. 2. pp. 8–21, 2016.

5. P. Plawiak, T. Sosnicki, M. Niedzwiecki, Z. Tabor, and K. Rzecki, “Hand body language gesture recognition based on signals from specialized glove and machine learning algorithms,” IEEE Trans. Ind. Inform., vol. 12, no. 3. pp. 1104–1113, 2016.

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Handling highly imbalanced data for classifying fatality of auto collisions using machine learning techniques;Journal of Management Analytics;2024-07-02

2. Prediction of the Need for Anticonvulsants in the Management of Orofacial Neuropathic Pain Using Machine Learning;Cureus;2024-04-24

3. Imbalanced Data Challenges and Their Resolution to Improve Fraud Detection in Credit Card Transactions;2024-02-20