Cross-Modal Interaction Network for Video Moment Retrieval
-
Published:2023-06-30
Issue:08
Volume:37
Page:
-
ISSN:0218-0014
-
Container-title:International Journal of Pattern Recognition and Artificial Intelligence
-
language:en
-
Short-container-title:Int. J. Patt. Recogn. Artif. Intell.
Author:
Ping Shen1,
Jiang Xiao1,
Tian Zean1ORCID,
Cao Ronghui1,
Chi Weiming1,
Yang Shenghong1
Affiliation:
1. College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, P. R. China
Abstract
The video moment retrieval task aims to fetch a target moment in an untrimmed video, which best matches the semantics of a sentence query. Existing methods mainly focus on utilizing two separate modules: one learns intra-modal relations to understand video and query contents, and the other explores inter-modal interactions to build a semantic bridge between video and language. However, intra-modal relations information can be easily overlooked when capturing inter-modal interactions. In fact, intra-modal relations and inter-modal interactions can be learned simultaneously within a unified module to make video and sentence guide each other. Towards this end, we propose a Cross-Modal Interaction Network (CMIN) for video moment retrieval by jointly exploring the intra-modal relations and inter-modal interactions between video frames and query words. In CMIN, a query-guided channel attention module is designed to suppress query-irrelevant visual features and enhance crucial contents; then a cross-attention module simultaneously considers intra-modal relations within each modality and fine-grained inter-modal interactions between frames and words, to enhance the semantic relevance between video and sentence query. Compared to the state-of-the-art methods, the experiments on two public datasets (Charades-STA and TACoS) demonstrate the superiority of our method.
Funder
National Natural Science Foundation of China
GHfund A
the National Key R&D Program of China
Publisher
World Scientific Pub Co Pte Ltd
Subject
Artificial Intelligence,Computer Vision and Pattern Recognition,Software
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. A Rate Control Scheme for VVC Intercoding Using a Linear Model;International Journal of Pattern Recognition and Artificial Intelligence;2024-02