Video Retrieval with Similarity-Preserving Deep Temporal Hashing-Reference-Cited by-同舟云学术

Video Retrieval with Similarity-Preserving Deep Temporal Hashing

Published:2019-11-30 Issue:4 Volume:15 Page:1-16
ISSN:1551-6857
Container-title:ACM Transactions on Multimedia Computing, Communications, and Applications
language:en
Short-container-title:ACM Trans. Multimedia Comput. Commun. Appl.

Author:

Shen Ling¹,Hong Richang¹,Zhang Haoran¹,Tian Xinmei²,Wang Meng¹^ORCID

Affiliation:

1. Hefei University of Technology, Hefei, Anhui, China

2. University of Science and Technology of China, Hefei, Anhui, China

Abstract

Despite the fact that remarkable progress has been made in recent years, Content-based Video Retrieval (CBVR) is still an appealing research topic due to increasing search demands in the Internet era of big data. This article aims to explore an efficient CBVR system by discriminately hashing videos into short binary codes. Existing video hashing methods usually encounter two weaknesses originating from the following sources: (1) Most works adopt the separated stages method or the frame-pooling based end-to-end architecture. However, the spatial-temporal properties of videos cannot be fully explored or kept well in the follow-up hashing step. (2) Discriminative learning based on pairwise or triplet constraints often suffers from slow convergence and poor local optimization, mainly because of the limited samples for each update. To alleviate these problems, we propose an end-to-end video retrieval framework called the Similarity-Preserving Deep Temporal Hashing (SPDTH) network. Specifically, we equip the model with the ability to capture spatial-temporal properties of videos and to generate binary codes by stacked Gated Recurrent Units (GRUs). It unifies video temporal modeling and learning to hash into one step to allow for maximum retention of information. We also introduce a deep metric learning objective called ℓ 2 All _ loss for network training by preserving intra-class similarity and inter-class separability, and a quantization loss between the real-valued outputs and the binary codes is minimized. Extensive experiments on several challenging datasets demonstrate that SPDTH can consistently outperform state-of-the-art methods.

Funder

National Natural Science Foundation of China

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture

Link

https://dl.acm.org/doi/pdf/10.1145/3356316

Reference43 articles.

1. Submodular video hashing

Cited by 25 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Stacked collaborative transformer network with contrastive learning for video moment localization;Intelligent Data Analysis;2024-06-20

2. Supervised Hierarchical Online Hashing for Cross-modal Retrieval;ACM Transactions on Multimedia Computing, Communications, and Applications;2024-01-11

3. Efficient Unsupervised Video Hashing With Contextual Modeling and Structural Controlling;IEEE Transactions on Multimedia;2024

4. Research on Video Retrieval Technology based on Multimodal Fusion and Attention Mechanism;Proceedings of the 2023 7th International Conference on Electronic Information Technology and Computer Engineering;2023-10-20

5. YuYin: a multi-task learning model of multi-modal e-commerce background music recommendation;EURASIP Journal on Audio, Speech, and Music Processing;2023-10-19