VMLH: Efficient Video Moment Location via Hashing
-
Published:2023-01-13
Issue:2
Volume:12
Page:420
-
ISSN:2079-9292
-
Container-title:Electronics
-
language:en
-
Short-container-title:Electronics
Author:
Tan Zhifang, Dong Fei, Liu Xinfang, Li Chenglong, Nie XiushanORCID
Abstract
Video-moment location by query is a hot topic in video understanding. However, most of the existing methods ignore the importance of location efficiency in practical application scenarios; video and query sentences have to be fed into the network at the same time during the retrieval, which leads to low efficiency. To address this issue, in this study, we propose an efficient video moment location via hashing (VMLH). In the proposed method, query sentences and video clips are, respectively, converted into hash codes and hash code sets, in which the semantic similarity between query sentences and video clips is preserved. The location prediction network is designed to predict the corresponding timestamp according to the similarity among hash codes, and the videos do not need to be fed into the network during the process of retrieval and location. Furthermore, different from the existing methods, which require complex interactions and fusion between video and query sentences, the proposed VMLH method only needs a simple XOR operation among codes to locate the video moment with high efficiency. This paper lays the foundation for fast video clip positioning and makes it possible to apply large-scale video clip positioning in practice. The experimental results on two public datasets demonstrate the effectiveness of the method.
Funder
National Natural Science Foundation of China Shandong Provincial Natural Science Foundation for Distinguished Young Scholars Shandong Provincial Natural Science Foundation Taishan Scholar Project of Shandong Province
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering
Reference31 articles.
1. Hendricks, L.A., Wang, O., Shechtman, E., Sivic, J., Darrell, T., and Russell, B.C. (2017, January 22–29). Localizing Moments in Video with Natural Language. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy. 2. Gao, J., Sun, C., Yang, Z., and Nevatia, R. (2017, January 22–29). TALL: Temporal Activity Localization via Language Query. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy. 3. Chen, J., Chen, X., Ma, L., Jie, Z., and Chua, T. (November, January 31). Temporally Grounding Natural Sentence in Video. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium. 4. Xu, H., He, K., Plummer, B.A., Sigal, L., Sclaroff, S., and Saenko, K. (February, January 27). Multilevel Language and Vision Integration for Text-to-Clip Retrieval. Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, Honolulu, HI, USA. 5. Zhang, D., Dai, X., Wang, X., Wang, Y., and Davis, L.S. (2019, January 15–20). MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|