A Survey of Full-Cycle Cross-Modal Retrieval: From a Representation Learning Perspective-Reference-Cited by-同舟云学术

A Survey of Full-Cycle Cross-Modal Retrieval: From a Representation Learning Perspective

Published:2023-04-04 Issue:7 Volume:13 Page:4571
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Wang Suping¹,Zhu Ligu¹²,Shi Lei¹,Mo Hao¹,Tan Songfu¹

Affiliation:

1. State Key Laboratory of Media Convergence and Communication, Communication University of China, Beijing 100024, China

2. Beijing Key Laboratory of Big Data in Security & Protection Industry, Beijing 100024, China

Abstract

Cross-modal retrieval aims to elucidate information fusion, imitate human learning, and advance the field. Although previous reviews have primarily focused on binary and real-value coding methods, there is a scarcity of techniques grounded in deep representation learning. In this paper, we concentrated on harmonizing cross-modal representation learning and the full-cycle modeling of high-level semantic associations between vision and language, diverging from traditional statistical methods. We systematically categorized and summarized the challenges and open issues in implementing current technologies and investigated the pipeline of cross-modal retrieval, including pre-processing, feature engineering, pre-training tasks, encoding, cross-modal interaction, decoding, model optimization, and a unified architecture. Furthermore, we propose benchmark datasets and evaluation metrics to assist researchers in keeping pace with cross-modal retrieval advancements. By incorporating recent innovative works, we offer a perspective on potential advancements in cross-modal retrieval.

Funder

National Key Research and Development Program of China

Fundamental Research Funds for the Central Universities

National Key R&D Program of China

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/13/7/4571/pdf

Reference108 articles.

1. Comparative analysis on cross-modal information retrieval: A review;Kaur;Comput. Sci. Rev.,2021

2. Deep multimodal learning: A survey on recent advances and trends;Ramachandram;IEEE Signal Process. Mag.,2017

3. Feng, F., Wang, X., and Li, R. (2022, January 14). Cross-modal retrieval with correspondence autoencoder. Proceedings of the 22nd ACM International Conference on Multimedia, Lisboa, Portugal.

4. Wang, K., Yin, Q., Wang, W., Wu, S., and Wang, L. (2016). A comprehensive survey on cross-modal retrieval. arXiv.

5. Cross-media analysis and reasoning: Advances and directions;Peng;Front. Inf. Technol. Electron. Eng.,2017

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Comparative Study of Object Recognition Utilizing Machine Learning Techniques;2024 International Conference on Communication, Computer Sciences and Engineering (IC3SE);2024-05-09

2. Deep cross-modal hashing with contrast learning and feature fusion;2023 International Conference on Image Processing, Computer Vision and Machine Learning (ICICML);2023-11-03

3. Explainable Image Classification: The Journey So Far and the Road Ahead;AI;2023-08-01