Affiliation:
1. Institute of Computer Science and Technology, Peking University, Beijing, China
Abstract
Existing cross-media retrieval methods usually require that testing categories remain the same with training categories, which cannot support the retrieval of increasing new categories. Inspired by zero-shot learning, this paper proposes zeroshot cross-media retrieval for addressing the above problem, which aims to retrieve data of new categories across different media types. It is challenging that zero-shot cross-media retrieval has to handle not only the inconsistent semantics across new and known categories, but also the heterogeneous distributions across different media types. To address the above challenges, this paper proposes Dual Adversarial Networks for Zero-shot Crossmedia Retrieval (DANZCR), which is the first approach to address zero-shot cross-media retrieval to the best of our knowledge. Our DANZCR approach consists of two GANs in a dual structure for common representation generation and original representation reconstruction respectively, which capture the underlying data structures as well as strengthen relations between input data and semantic space to generalize across seen and unseen categories. Our DANZCR approach exploits word embeddings to learn common representations in semantic space via an adversarial learning method, which preserves the inherent cross-media correlation and enhances the knowledge transfer to new categories. Experiments on three widely-used cross-media retrieval datasets show the effectiveness of our approach.
Publisher
International Joint Conferences on Artificial Intelligence Organization
Cited by
13 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. ACE-BERT: Adversarial Cross-Modal Enhanced BERT for E-Commerce Retrieval;Lecture Notes in Computer Science;2024
2. Ranking on Heterogeneous Manifold for Multimodal Information Retrieval;2023 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom);2023-12-21
3. Alignment efficient image-sentence retrieval considering transferable cross-modal representation learning;Frontiers of Computer Science;2023-12-02
4. A review on multimodal zero‐shot learning;WIREs Data Mining and Knowledge Discovery;2023-01-20
5. Multimodal Disentanglement Variational AutoEncoders for Zero-Shot Cross-Modal Retrieval;Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval;2022-07-06