VMLH: Efficient Video Moment Location via Hashing-Reference-Cited by-同舟云学术

VMLH: Efficient Video Moment Location via Hashing

Published:2023-01-13 Issue:2 Volume:12 Page:420
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Tan Zhifang,Dong Fei,Liu Xinfang,Li Chenglong,Nie Xiushan^ORCID

Abstract

Video-moment location by query is a hot topic in video understanding. However, most of the existing methods ignore the importance of location efficiency in practical application scenarios; video and query sentences have to be fed into the network at the same time during the retrieval, which leads to low efficiency. To address this issue, in this study, we propose an efficient video moment location via hashing (VMLH). In the proposed method, query sentences and video clips are, respectively, converted into hash codes and hash code sets, in which the semantic similarity between query sentences and video clips is preserved. The location prediction network is designed to predict the corresponding timestamp according to the similarity among hash codes, and the videos do not need to be fed into the network during the process of retrieval and location. Furthermore, different from the existing methods, which require complex interactions and fusion between video and query sentences, the proposed VMLH method only needs a simple XOR operation among codes to locate the video moment with high efficiency. This paper lays the foundation for fast video clip positioning and makes it possible to apply large-scale video clip positioning in practice. The experimental results on two public datasets demonstrate the effectiveness of the method.

Funder

National Natural Science Foundation of China

Shandong Provincial Natural Science Foundation for Distinguished Young Scholars

Shandong Provincial Natural Science Foundation

Taishan Scholar Project of Shandong Province

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/12/2/420/pdf

Reference31 articles.

1. Hendricks, L.A., Wang, O., Shechtman, E., Sivic, J., Darrell, T., and Russell, B.C. (2017, January 22–29). Localizing Moments in Video with Natural Language. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy.

2. Gao, J., Sun, C., Yang, Z., and Nevatia, R. (2017, January 22–29). TALL: Temporal Activity Localization via Language Query. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy.

3. Chen, J., Chen, X., Ma, L., Jie, Z., and Chua, T. (November, January 31). Temporally Grounding Natural Sentence in Video. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium.

4. Xu, H., He, K., Plummer, B.A., Sigal, L., Sclaroff, S., and Saenko, K. (February, January 27). Multilevel Language and Vision Integration for Text-to-Clip Retrieval. Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, Honolulu, HI, USA.

5. Zhang, D., Dai, X., Wang, X., Wang, Y., and Davis, L.S. (2019, January 15–20). MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Novel Epidemic-Based Video Diffusion Strategy Using Awareness of Sociality and Mobility in Wireless Networks;Electronics;2023-03-09