Abstract
Cross-modal retrieval aims to search samples of one modality via queries of other modalities, which is a hot issue in the community of multimedia. However, two main challenges, i.e., heterogeneity gap and semantic interaction across different modalities, have not been solved efficaciously. Reducing the heterogeneous gap can improve the cross-modal similarity measurement. Meanwhile, modeling cross-modal semantic interaction can capture the semantic correlations more accurately. To this end, this paper presents a novel end-to-end framework, called Dual Attention Generative Adversarial Network (DA-GAN). This technique is an adversarial semantic representation model with a dual attention mechanism, i.e., intra-modal attention and inter-modal attention. Intra-modal attention is used to focus on the important semantic feature within a modality, while inter-modal attention is to explore the semantic interaction between different modalities and then represent the high-level semantic correlation more precisely. A dual adversarial learning strategy is designed to generate modality-invariant representations, which can reduce the cross-modal heterogeneity efficiently. The experiments on three commonly used benchmarks show the better performance of DA-GAN than these competitors.
Subject
Computer Networks and Communications
Cited by
8 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Cross-modal Image Retrieval Based on Scene Graphs;Proceedings of the 2024 Guangdong-Hong Kong-Macao Greater Bay Area International Conference on Education Digitalization and Computer Science;2024-07-26
2. Distribution Enhancement for Imbalanced Data with Generative Adversarial Network;Advanced Theory and Simulations;2024-06-22
3. Dual cycle generative adversarial networks for web search;Applied Soft Computing;2024-03
4. Adversarial learning based intra-modal classifier for cross-modal hashing retrieval;2023 International Conference on Cyber-Physical Social Intelligence (ICCSI);2023-10-20
5. Advances Techniques in Computer Vision and Multimedia;Future Internet;2023-09-01