Affiliation:
1. State Key Laboratory of Media Convergence and Communication, Communication University of China, Beijing 100024, China
2. Beijing Key Laboratory of Big Data in Security & Protection Industry, Beijing 100024, China
Abstract
Cross-modal retrieval aims to elucidate information fusion, imitate human learning, and advance the field. Although previous reviews have primarily focused on binary and real-value coding methods, there is a scarcity of techniques grounded in deep representation learning. In this paper, we concentrated on harmonizing cross-modal representation learning and the full-cycle modeling of high-level semantic associations between vision and language, diverging from traditional statistical methods. We systematically categorized and summarized the challenges and open issues in implementing current technologies and investigated the pipeline of cross-modal retrieval, including pre-processing, feature engineering, pre-training tasks, encoding, cross-modal interaction, decoding, model optimization, and a unified architecture. Furthermore, we propose benchmark datasets and evaluation metrics to assist researchers in keeping pace with cross-modal retrieval advancements. By incorporating recent innovative works, we offer a perspective on potential advancements in cross-modal retrieval.
Funder
National Key Research and Development Program of China
Fundamental Research Funds for the Central Universities
National Key R&D Program of China
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference108 articles.
1. Comparative analysis on cross-modal information retrieval: A review;Kaur;Comput. Sci. Rev.,2021
2. Deep multimodal learning: A survey on recent advances and trends;Ramachandram;IEEE Signal Process. Mag.,2017
3. Feng, F., Wang, X., and Li, R. (2022, January 14). Cross-modal retrieval with correspondence autoencoder. Proceedings of the 22nd ACM International Conference on Multimedia, Lisboa, Portugal.
4. Wang, K., Yin, Q., Wang, W., Wu, S., and Wang, L. (2016). A comprehensive survey on cross-modal retrieval. arXiv.
5. Cross-media analysis and reasoning: Advances and directions;Peng;Front. Inf. Technol. Electron. Eng.,2017
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Comparative Study of Object Recognition Utilizing Machine Learning Techniques;2024 International Conference on Communication, Computer Sciences and Engineering (IC3SE);2024-05-09
2. Deep cross-modal hashing with contrast learning and feature fusion;2023 International Conference on Image Processing, Computer Vision and Machine Learning (ICICML);2023-11-03
3. Explainable Image Classification: The Journey So Far and the Road Ahead;AI;2023-08-01