Boosting semi‐supervised learning under imbalanced regression via pseudo‐labeling-Reference-Cited by-同舟云学术

Boosting semi‐supervised learning under imbalanced regression via pseudo‐labeling

Published:2024-06-30 Issue:19 Volume:36 Page:
ISSN:1532-0626
Container-title:Concurrency and Computation: Practice and Experience
language:en
Short-container-title:Concurrency and Computation

Author:

Zong Nannan¹^ORCID,Su Songzhi¹,Zhou Changle¹

Affiliation:

1. School of Informatics Xiamen University Fujian China

Abstract

SummaryImbalanced samples are widespread, which impairs the generalization and fairness of models. Semi‐supervised learning can overcome the deficiency of rare labeled samples, but it is challenging to select high‐quality pseudo‐label data. Unlike discrete labels that can be matched one‐to‐one with points on a numerical axis, labels in regression tasks are consecutive and cannot be directly chosen. Besides, the distribution of unlabeled data is imbalanced, which easily leads to an imbalanced distribution of pseudo‐label data, exacerbating the imbalance in the semi‐supervised dataset. To solve this problem, this article proposes a semi‐supervised imbalanced regression network (SIRN), which consists of two components: A, designed to learn the relationship between features and labels (targets), and B, dedicated to learning the relationship between features and target deviations. To measure target deviations under imbalanced distribution, the target deviation function is introduced. To select continuous pseudo‐labels, the deviation matching strategy is designed. Furthermore, an adaptive selection function is developed to mitigate the risk of skewed distributions due to imbalanced pseudo‐label data. Finally, the effectiveness of the proposed method is validated through evaluations of two regression tasks. The results show a great reduction in predicted value error, particularly in few‐shot regions. This empirical evidence confirms the efficacy of our method in addressing the issue of imbalanced samples in regression tasks.

Funder

National Natural Science Foundation of China

Publisher

Wiley

Link

https://onlinelibrary.wiley.com/doi/pdf/10.1002/cpe.8103

Reference40 articles.

1. Proceedings of Machine Learning Research;Branco P,2017

2. MODSC: Many-Objective-Optimization-Driven Data-Balancing Strategy in Cross-Architectural Malware Classification for Extreme IoT

3. A Survey on Differential Privacy for Unstructured Data Content

4. On communication efficient dataflow computing in software defined networking enabled cloud

5. A New Subspace Clustering Strategy for AI-Based Data Analysis in IoT System