Affiliation:
1. School of Informatics Xiamen University Fujian China
Abstract
SummaryImbalanced samples are widespread, which impairs the generalization and fairness of models. Semi‐supervised learning can overcome the deficiency of rare labeled samples, but it is challenging to select high‐quality pseudo‐label data. Unlike discrete labels that can be matched one‐to‐one with points on a numerical axis, labels in regression tasks are consecutive and cannot be directly chosen. Besides, the distribution of unlabeled data is imbalanced, which easily leads to an imbalanced distribution of pseudo‐label data, exacerbating the imbalance in the semi‐supervised dataset. To solve this problem, this article proposes a semi‐supervised imbalanced regression network (SIRN), which consists of two components: A, designed to learn the relationship between features and labels (targets), and B, dedicated to learning the relationship between features and target deviations. To measure target deviations under imbalanced distribution, the target deviation function is introduced. To select continuous pseudo‐labels, the deviation matching strategy is designed. Furthermore, an adaptive selection function is developed to mitigate the risk of skewed distributions due to imbalanced pseudo‐label data. Finally, the effectiveness of the proposed method is validated through evaluations of two regression tasks. The results show a great reduction in predicted value error, particularly in few‐shot regions. This empirical evidence confirms the efficacy of our method in addressing the issue of imbalanced samples in regression tasks.
Funder
National Natural Science Foundation of China