Level-wise aligned dual networks for text–video retrieval-Reference-Cited by-同舟云学术

Level-wise aligned dual networks for text–video retrieval

Published:2022-07-07 Issue:1 Volume:2022 Page:
ISSN:1687-6180
Container-title:EURASIP Journal on Advances in Signal Processing
language:en
Short-container-title:EURASIP J. Adv. Signal Process.

Author:

Lin Qiubin,Cao Wenming,He Zhiquan

Abstract

AbstractThe vast amount of videos on the Internet makes efficient and accurate text–video retrieval tasks increasingly important. The current methods leverage a high-dimensional space to align video and text for these tasks. However, a high-dimensional space cannot fully use different levels of information in videos and text. In this paper, we put forward a method called level-wise aligned dual networks (LADNs) for text–video retrieval. LADN uses four common latent spaces to improve the performance of text–video retrieval and utilizes the semantic concept space to increase the interpretability of the model. Specifically, LADN first extracts different levels of information, including global, local, temporal, and spatial–temporal information, from videos and text. Then, they are mapped into four different latent spaces and one semantic space. Finally, LADN aligns different levels of information in various spaces. Extensive experiments conducted on three widely used datasets, including MSR-VTT, VATEX, and TRECVID AVS 2016-2018, demonstrate that our proposed approach is superior to several state-of-the-art text–video retrieval approaches.

Funder

National Natural Science Foundation of China

Fundamental Research Foundation of Shenzhen

Publisher

Springer Science and Business Media LLC

Subject

General Medicine

Link

https://link.springer.com/content/pdf/10.1186/s13634-022-00887-y.pdf

Reference60 articles.

1. L.-Q. Zhang, L.-Y. Huang, X.-l. Duan, Video person reidentification based on neural ordinary differential equations and graph convolution network (2021)

2. J. Dalton, J. Allan, P. Mirajkar, Zero-shot video retrieval using content and concepts, in Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, (2013), pp. 1857–1860