Affiliation:
1. School of Control Science and Engineering, Tiangong University, Tianjin 300387, China
2. Tianjin Key Laboratory of Intelligent Control of Electrical Equipment, Tiangong University, Tianjin 300387, China
Abstract
The regression model has higher requirements for the quality and balance of data to ensure the accuracy of predictions. However, there is a common problem of imbalanced distribution in real datasets, which directly affects the prediction accuracy of regression models. In order to solve the problem of data imbalance regression, considering the continuity of the target value and the correlation of the data and using the idea of optimization and confrontation, we propose an IRGAN (imbalanced regression generative adversarial network) algorithm. Considering the context information of the target data and the disappearance of the deep network gradient, we constructed a generation module and designed a composite loss function. In the early stages of training, the gap between the generated samples and the real samples is large, which easily causes the problem of non-convergence. A correction module is designed to train the internal relationship between the state and action as well as the subsequent state and reward of the real samples, guide the generation module to generate samples, and alleviate the non-convergence of the training process. The corrected samples and real samples are input into the discriminant module. On this basis, the confrontation idea is used to generate high-quality samples to balance the original samples. The proposed method is tested in the fields of aerospace, biology, physics, and chemistry. The similarity between the generated samples and the real samples is comprehensively measured from multiple perspectives to evaluate the quality of the generated samples, which proves the superiority of the generated module. Regression prediction is performed on the balanced samples processed by the IRGAN algorithm, and it is proven that the proposed algorithm can improve the prediction accuracy in terms of the imbalanced data regression problem.
Funder
Tianjin Research Innovation Project for Postgraduate Students
Reference30 articles.
1. A systematic review on imbalanced data challenges in machine learning: Applications and solutions;Kaur;ACM Comput. Surv. (CSUR),2019
2. A membership-based resampling and cleaning algorithm for multi-class imbalanced overlapping data;Ma;Expert Syst. Appl.,2024
3. Unbalanced regression sample generation algorithm based on confrontation;Tian;Inf. Sci.,2023
4. Petinrin, O., Saeed, F., and Salim, N. (2023). Dimension Reduction and Classifier-Based Feature Selection for Oversampled Gene Expression Data and Cancer Classification. Processes, 11.
5. Pei, X., Su, S., Jiang, L., Chu, C., Gong, L., and Yuan, Y. (2022). Research on rolling bearing fault diagnosis method based on generative adversarial and transfer learning. Processes, 10.