Video Highlight Detection via Region-Based Deep Ranking Model-Reference-Cited by-同舟云学术

Video Highlight Detection via Region-Based Deep Ranking Model

Published:2019-06-07 Issue:07 Volume:33 Page:1940001
ISSN:0218-0014
Container-title:International Journal of Pattern Recognition and Artificial Intelligence
language:en
Short-container-title:Int. J. Patt. Recogn. Artif. Intell.

Author:

Jiao Yifan¹²^ORCID,Zhang Tianzhu²,Huang Shucheng¹,Liu Bin³,Xu Changsheng²

Affiliation:

1. Jiangsu University of Science and Technology, Zhenjiang 212003, P. R. China

2. National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, P. R. China

3. Moshanghua Tech Company, Ltd., Beijing 100030, P. R. China

Abstract

The video highlight detection task is to localize key elements (moments of user’s major or special interest) in a video. Most of the existing highlight detection approaches extract features from the video segment as a whole without considering the difference of local features spatially. In spatial extent, not all regions are worth watching because some of them only contain the background of the environment without human or other moving objects, especially when there is lots of clutter in the background. To deal with this issue, we propose a novel region-based model which can automatically localize the key elements in a video without any extra supervised annotations. Specifically, the proposed model produces position-sensitive score maps for local regions in the spatial dimension of the video segment, and then aggregates all position-wise scores with position-pooling operation. The regions with higher response values will be extracted as key elements. Thus more effective features of the video segment are obtained to predict the highlight score. The proposed position-sensitive scheme can be easily integrated into an end-to-end fully convolutional network which aims to update parameters via stochastic gradient descent method in the backward propagation to improve the robustness of the model. Extensive experimental results on the YouTube and SumMe datasets demonstrate that the proposed approach achieves significant improvement over state-of-the-art methods.

Publisher

World Scientific Pub Co Pte Lt

Subject

Artificial Intelligence,Computer Vision and Pattern Recognition,Software

Link

https://www.worldscientific.com/doi/pdf/10.1142/S0218001419400019

Reference17 articles.

1. Deep Relative Tracking

2. ImageNet classification with deep convolutional neural networks

3. Fashion Parsing With Weak Color-Category Labels

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Dual-Stream Multimodal Learning for Topic-Adaptive Video Highlight Detection;Proceedings of the 2023 ACM International Conference on Multimedia Retrieval;2023-06-12

2. Show Me What I Like: Detecting User-Specific Video Highlights Using Content-Based Multi-Head Attention;Proceedings of the 30th ACM International Conference on Multimedia;2022-10-10

3. Multimodal learning model based on video–audio–chat feature fusion for detecting e-sports highlights;Applied Soft Computing;2022-09

4. HighlightMe: Detecting Highlights from Human-Centric Videos;2021 IEEE/CVF International Conference on Computer Vision (ICCV);2021-10

5. Research Issues & State of the Art Challenges in Event Detection;2021 15th International Conference on Ubiquitous Information Management and Communication (IMCOM);2021-01-04