Incomplete Cross-Modal Retrieval with Deep Correlation Transfer-Reference-Cited by-同舟云学术

Incomplete Cross-Modal Retrieval with Deep Correlation Transfer

Published:2024-01-11 Issue:5 Volume:20 Page:1-21
ISSN:1551-6857
Container-title:ACM Transactions on Multimedia Computing, Communications, and Applications
language:en
Short-container-title:ACM Trans. Multimedia Comput. Commun. Appl.

Author:

Shi Dan¹^ORCID,Zhu Lei¹^ORCID,Li Jingjing²^ORCID,Dong Guohua³^ORCID,Zhang Huaxiang¹^ORCID

Affiliation:

1. Shandong Normal University, China

2. University of Electronic Science and Technology of China, China

3. Institute of Basic Medical Sciences, China

Abstract

Most cross-modal retrieval methods assume the multi-modal training data is complete and has a one-to-one correspondence. However, in the real world, multi-modal data generally suffers from missing modality information due to the uncertainty of data collection and storage processes, which limits the practical application of existing cross-modal retrieval methods. Although some solutions have been proposed to generate the missing modality data using a single pseudo sample, this may lead to incomplete semantic restoration and sub-optimal retrieval results due to the limited semantic information it provides. To address this challenge, this article proposes an Incomplete Cross-Modal Retrieval with Deep Correlation Transfer (ICMR-DCT) method that can robustly model incomplete multi-modal data and dynamically capture the adjacency semantic correlation for cross-modal retrieval. Specifically, we construct intra-modal graph attention-based auto-encoder to learn modality-invariant representations by performing semantic reconstruction through intra-modality adjacency correlation mining. Then, we design dual cross-modal alignment constraints to project multi-modal representations into a common semantic space, thus bridging the heterogeneous modality gap and enhancing the discriminability of the common representation. We further introduce semantic preservation to enhance adjacency semantic information and achieve cross-modal semantic correlation. Moreover, we propose a nearest-neighbor weighting integration strategy with cross-modal correlation transfer to generate the missing modality data according to inter-modality mapping relations and adjacency correlations between each sample and its neighbors, which improves the robustness of our method against incomplete multi-modal training data. Extensive experiments on three widely tested benchmark datasets demonstrate the superior performance of our method in cross-modal retrieval tasks under both complete and incomplete retrieval scenarios. Our used datasets and source codes are available at https://github.com/shidan0122/DCT.git .

Funder

National Natural Science Foundation of China

Natural Science Foundation of Shandong Province

Taishan Scholar Foundation of Shandong Province

CCF-Baidu Open Fund

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture

Link

https://dl.acm.org/doi/pdf/10.1145/3637442

Reference57 articles.

1. Multi-feature, multi-modal, and multi-source social event detection: A comprehensive survey

2. Galen Andrew, Raman Arora, Jeff A. Bilmes, and Karen Livescu. 2013. Deep canonical correlation analysis. In Proceedings of ICML, Vol. 28. 1247–1255.

3. Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. 2014. Spectral networks and locally connected networks on graphs. In Proceedings of ICLR.

4. Min Cao, Shiping Li, Juntao Li, Liqiang Nie, and Min Zhang. 2022. Image-text retrieval: A survey on recent research and development. In Proceedings of IJCAI. 5410–5417.

5. Adversarial Examples Generation for Deep Product Quantization Networks on Image Retrieval

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Semantic Reconstruction Guided Missing Cross-modal Hashing;2024 International Joint Conference on Neural Networks (IJCNN);2024-06-30