Leveraging Self-Distillation and Disentanglement Network to Enhance Visual–Semantic Feature Consistency in Generalized Zero-Shot Learning-Reference-Cited by-同舟云学术

Leveraging Self-Distillation and Disentanglement Network to Enhance Visual–Semantic Feature Consistency in Generalized Zero-Shot Learning

Published:2024-05-18 Issue:10 Volume:13 Page:1977
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Liu Xiaoming¹²³^ORCID,Wang Chen¹²^ORCID,Yang Guan¹²,Wang Chunhua⁴,Long Yang⁵,Liu Jie³⁶,Zhang Zhiyuan¹²

Affiliation:

1. School of Computer Science, Zhongyuan University of Technology, Zhengzhou 450007, China

2. Zhengzhou Key Laboratory of Text Processing and Image Understanding, Zhengzhou 450007, China

3. Research Center for Language Intelligence of China, Beijing 100089, China

4. School of Animation Academy, Huanghuai University, Zhumadian 463000, China

5. Department of Computer Science, Durham University, Durham DH1 3LE, UK

6. School of Information Science, North China University of Technology, Beijing 100144, China

Abstract

Generalized zero-shot learning (GZSL) aims to simultaneously recognize both seen classes and unseen classes by training only on seen class samples and auxiliary semantic descriptions. Recent state-of-the-art methods infer unseen classes based on semantic information or synthesize unseen classes using generative models based on semantic information, all of which rely on the correct alignment of visual–semantic features. However, they often overlook the inconsistency between original visual features and semantic attributes. Additionally, due to the existence of cross-modal dataset biases, the visual features extracted and synthesized by the model may also mismatch with some semantic features, which could hinder the model from properly aligning visual–semantic features. To address this issue, this paper proposes a GZSL framework that enhances the consistency of visual–semantic features using a self-distillation and disentanglement network (SDDN). The aim is to utilize the self-distillation and disentanglement network to obtain semantically consistent refined visual features and non-redundant semantic features to enhance the consistency of visual–semantic features. Firstly, SDDN utilizes self-distillation technology to refine the extracted and synthesized visual features of the model. Subsequently, the visual–semantic features are then disentangled and aligned using a disentanglement network to enhance the consistency of the visual–semantic features. Finally, the consistent visual–semantic features are fused to jointly train a GZSL classifier. Extensive experiments demonstrate that the proposed method achieves more competitive results on four challenging benchmark datasets (AWA2, CUB, FLO, and SUN).

Funder

National Natural Science Foundation of China

the Key Scientific Research Project of Higher Education Institutions in Henan Province

Postgraduate Education Reform and Quality Improvement Project of Henan Province

National Science and Technology Major Project

Key Scientific Research Project of Higher Education Institutions in Henan Province

The Research and Innovation Project of Graduate Students in Zhongyuan University of Technology

Special Fund Project for Basic Scientific Research of Zhongyuan University of Technology

Publisher

MDPI AG

Link

https://www.mdpi.com/2079-9292/13/10/1977/pdf

Reference47 articles.

1. Wang, Z., Hao, Y., Mu, T., Li, O., Wang, S., and He, X. (2023, January 17–24). Bi-directional distribution alignment for transductive zero-shot learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.

2. Chen, Z., Luo, Y., Qiu, R., Wang, S., Huang, Z., Li, J., and Zhang, Z. (2021, January 11–17). Semantics disentangling for generalized zero-shot learning. Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada.

3. Chen, S., Wang, W., Xia, B., Peng, Q., You, X., Zheng, F., and Shao, L. (2021, January 11–17). Free: Feature refinement for generalized zero-shot learning. Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada.

4. Generalized zero-shot learning via disentangled representation;Li;AAAI Conf. Artif. Intell.,2021

5. Chen, Z., Li, J., Luo, Y., Huang, Z., and Yang, Y. (2019, January 7–11). Canzsl: Cycle-consistent adversarial networks for zero-shot learning from natural language. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA.