Incomplete Cross-Modal Retrieval with Deep Correlation Transfer
-
Published:2024-01-11
Issue:5
Volume:20
Page:1-21
-
ISSN:1551-6857
-
Container-title:ACM Transactions on Multimedia Computing, Communications, and Applications
-
language:en
-
Short-container-title:ACM Trans. Multimedia Comput. Commun. Appl.
Author:
Shi Dan1ORCID,
Zhu Lei1ORCID,
Li Jingjing2ORCID,
Dong Guohua3ORCID,
Zhang Huaxiang1ORCID
Affiliation:
1. Shandong Normal University, China
2. University of Electronic Science and Technology of China, China
3. Institute of Basic Medical Sciences, China
Abstract
Most cross-modal retrieval methods assume the multi-modal training data is complete and has a one-to-one correspondence. However, in the real world, multi-modal data generally suffers from missing modality information due to the uncertainty of data collection and storage processes, which limits the practical application of existing cross-modal retrieval methods. Although some solutions have been proposed to generate the missing modality data using a single pseudo sample, this may lead to incomplete semantic restoration and sub-optimal retrieval results due to the limited semantic information it provides. To address this challenge, this article proposes an Incomplete Cross-Modal Retrieval with Deep Correlation Transfer (ICMR-DCT) method that can robustly model incomplete multi-modal data and dynamically capture the adjacency semantic correlation for cross-modal retrieval. Specifically, we construct intra-modal graph attention-based auto-encoder to learn modality-invariant representations by performing semantic reconstruction through intra-modality adjacency correlation mining. Then, we design dual cross-modal alignment constraints to project multi-modal representations into a common semantic space, thus bridging the heterogeneous modality gap and enhancing the discriminability of the common representation. We further introduce semantic preservation to enhance adjacency semantic information and achieve cross-modal semantic correlation. Moreover, we propose a nearest-neighbor weighting integration strategy with cross-modal correlation transfer to generate the missing modality data according to inter-modality mapping relations and adjacency correlations between each sample and its neighbors, which improves the robustness of our method against incomplete multi-modal training data. Extensive experiments on three widely tested benchmark datasets demonstrate the superior performance of our method in cross-modal retrieval tasks under both complete and incomplete retrieval scenarios. Our used datasets and source codes are available at
https://github.com/shidan0122/DCT.git
.
Funder
National Natural Science Foundation of China
Natural Science Foundation of Shandong Province
Taishan Scholar Foundation of Shandong Province
CCF-Baidu Open Fund
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Networks and Communications,Hardware and Architecture
Reference57 articles.
1. Multi-feature, multi-modal, and multi-source social event detection: A comprehensive survey
2. Galen Andrew, Raman Arora, Jeff A. Bilmes, and Karen Livescu. 2013. Deep canonical correlation analysis. In Proceedings of ICML, Vol. 28. 1247–1255.
3. Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. 2014. Spectral networks and locally connected networks on graphs. In Proceedings of ICLR.
4. Min Cao, Shiping Li, Juntao Li, Liqiang Nie, and Min Zhang. 2022. Image-text retrieval: A survey on recent research and development. In Proceedings of IJCAI. 5410–5417.
5. Adversarial Examples Generation for Deep Product Quantization Networks on Image Retrieval
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Semantic Reconstruction Guided Missing Cross-modal Hashing;2024 International Joint Conference on Neural Networks (IJCNN);2024-06-30