Research on Imbalanced Data Regression Based on Confrontation

Author:

Liu Xiaowen12ORCID,Tian Huixin12ORCID

Affiliation:

1. School of Control Science and Engineering, Tiangong University, Tianjin 300387, China

2. Tianjin Key Laboratory of Intelligent Control of Electrical Equipment, Tiangong University, Tianjin 300387, China

Abstract

The regression model has higher requirements for the quality and balance of data to ensure the accuracy of predictions. However, there is a common problem of imbalanced distribution in real datasets, which directly affects the prediction accuracy of regression models. In order to solve the problem of data imbalance regression, considering the continuity of the target value and the correlation of the data and using the idea of optimization and confrontation, we propose an IRGAN (imbalanced regression generative adversarial network) algorithm. Considering the context information of the target data and the disappearance of the deep network gradient, we constructed a generation module and designed a composite loss function. In the early stages of training, the gap between the generated samples and the real samples is large, which easily causes the problem of non-convergence. A correction module is designed to train the internal relationship between the state and action as well as the subsequent state and reward of the real samples, guide the generation module to generate samples, and alleviate the non-convergence of the training process. The corrected samples and real samples are input into the discriminant module. On this basis, the confrontation idea is used to generate high-quality samples to balance the original samples. The proposed method is tested in the fields of aerospace, biology, physics, and chemistry. The similarity between the generated samples and the real samples is comprehensively measured from multiple perspectives to evaluate the quality of the generated samples, which proves the superiority of the generated module. Regression prediction is performed on the balanced samples processed by the IRGAN algorithm, and it is proven that the proposed algorithm can improve the prediction accuracy in terms of the imbalanced data regression problem.

Funder

Tianjin Research Innovation Project for Postgraduate Students

Publisher

MDPI AG

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3