Incomplete Cross-Modal Retrieval with Deep Correlation Transfer

Author:

Shi Dan1ORCID,Zhu Lei1ORCID,Li Jingjing2ORCID,Dong Guohua3ORCID,Zhang Huaxiang1ORCID

Affiliation:

1. Shandong Normal University, China

2. University of Electronic Science and Technology of China, China

3. Institute of Basic Medical Sciences, China

Abstract

Most cross-modal retrieval methods assume the multi-modal training data is complete and has a one-to-one correspondence. However, in the real world, multi-modal data generally suffers from missing modality information due to the uncertainty of data collection and storage processes, which limits the practical application of existing cross-modal retrieval methods. Although some solutions have been proposed to generate the missing modality data using a single pseudo sample, this may lead to incomplete semantic restoration and sub-optimal retrieval results due to the limited semantic information it provides. To address this challenge, this article proposes an Incomplete Cross-Modal Retrieval with Deep Correlation Transfer (ICMR-DCT) method that can robustly model incomplete multi-modal data and dynamically capture the adjacency semantic correlation for cross-modal retrieval. Specifically, we construct intra-modal graph attention-based auto-encoder to learn modality-invariant representations by performing semantic reconstruction through intra-modality adjacency correlation mining. Then, we design dual cross-modal alignment constraints to project multi-modal representations into a common semantic space, thus bridging the heterogeneous modality gap and enhancing the discriminability of the common representation. We further introduce semantic preservation to enhance adjacency semantic information and achieve cross-modal semantic correlation. Moreover, we propose a nearest-neighbor weighting integration strategy with cross-modal correlation transfer to generate the missing modality data according to inter-modality mapping relations and adjacency correlations between each sample and its neighbors, which improves the robustness of our method against incomplete multi-modal training data. Extensive experiments on three widely tested benchmark datasets demonstrate the superior performance of our method in cross-modal retrieval tasks under both complete and incomplete retrieval scenarios. Our used datasets and source codes are available at https://github.com/shidan0122/DCT.git .

Funder

National Natural Science Foundation of China

Natural Science Foundation of Shandong Province

Taishan Scholar Foundation of Shandong Province

CCF-Baidu Open Fund

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture

Reference57 articles.

1. Multi-feature, multi-modal, and multi-source social event detection: A comprehensive survey

2. Galen Andrew, Raman Arora, Jeff A. Bilmes, and Karen Livescu. 2013. Deep canonical correlation analysis. In Proceedings of ICML, Vol. 28. 1247–1255.

3. Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. 2014. Spectral networks and locally connected networks on graphs. In Proceedings of ICLR.

4. Min Cao, Shiping Li, Juntao Li, Liqiang Nie, and Min Zhang. 2022. Image-text retrieval: A survey on recent research and development. In Proceedings of IJCAI. 5410–5417.

5. Adversarial Examples Generation for Deep Product Quantization Networks on Image Retrieval

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Semantic Reconstruction Guided Missing Cross-modal Hashing;2024 International Joint Conference on Neural Networks (IJCNN);2024-06-30

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3