Generalized Zero-Shot Image Classification via Partially-Shared Multi-Task Representation Learning
-
Published:2023-05-03
Issue:9
Volume:12
Page:2085
-
ISSN:2079-9292
-
Container-title:Electronics
-
language:en
-
Short-container-title:Electronics
Author:
Wang Gerui12, Tang Sheng12
Affiliation:
1. School of Computer Science and Engineering, Central South University, Changsha 410083, China 2. Hunan Engineering Research Center of Machine Vision and Intelligent Medicine, Central South University, Changsha 410083, China
Abstract
Generalized Zero-Shot Learning (GZSL) holds significant research importance as it enables the classification of samples from both seen and unseen classes. A prevailing approach for GZSL is learning transferable representations that can generalize well to both seen and unseen classes during testing. This approach encompasses two key concepts: discriminative representations and semantic-relevant representations. “Semantic-relevant” facilitates the transfer of semantic knowledge using pre-defined semantic descriptors, while “discriminative” is crucial for accurate category discrimination. However, these two concepts are arguably inherently conflicting, as semantic descriptors are not specifically designed for image classification. Existing methods often struggle with balancing these two aspects and neglect the conflict between them, leading to suboptimal representation generalization and transferability to unseen classes. To address this issue, we propose a novel partially-shared multi-task representation learning method, termed PS-GZSL, which jointly preserves complementary and sharable knowledge between these two concepts. Specifically, we first propose a novel perspective that treats the learning of discriminative and semantic-relevant representations as optimizing a discrimination task and a visual-semantic alignment task, respectively. Then, to learn more complete and generalizable representations, PS-GZSL explicitly factorizes visual features into task-shared and task-specific representations and introduces two advanced tasks: an instance-level contrastive discrimination task and a relation-based visual-semantic alignment task. Furthermore, PS-GZSL employs Mixture-of-Experts (MoE) with a dropout mechanism to prevent representation degeneration and integrates a conditional GAN (cGAN) to synthesize unseen features for estimating unseen visual features. Extensive experiments and more competitive results on five widely-used GZSL benchmark datasets validate the effectiveness of our PS-GZSL.
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering
Reference48 articles.
1. Xian, Y., Schiele, B., and Akata, Z. (2017, January 21–26). Zero-shot learning-the good, the bad and the ugly. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA. 2. Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv. 3. Lampert, C.H., Nickisch, H., and Harmeling, S. (2009, January 20–25). Learning to detect unseen object classes by between-class attribute transfer. Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA. 4. Zero-shot learning with semantic output codes;Palatucci;Adv. Neural Inf. Process. Syst.,2009 5. Chao, W.L., Changpinyo, S., Gong, B., and Sha, F. (2016, January 11–14). An empirical study and analysis of generalized zero-shot learning for object recognition in the wild. Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands. Proceedings, Part II 14.
|
|