Deep Unsupervised Key Frame Extraction for Efficient Video Classification-Reference-Cited by-同舟云学术

Deep Unsupervised Key Frame Extraction for Efficient Video Classification

Published:2023-02-25 Issue:3 Volume:19 Page:1-17
ISSN:1551-6857
Container-title:ACM Transactions on Multimedia Computing, Communications, and Applications
language:en
Short-container-title:ACM Trans. Multimedia Comput. Commun. Appl.

Author:

Tang Hao¹^ORCID,Ding Lei²^ORCID,Wu Songsong³^ORCID,Ren Bin²^ORCID,Sebe Nicu²^ORCID,Rota Paolo²^ORCID

Affiliation:

1. ETH Zurich, Zurich, Switzerland

2. University of Trento, Trento, Italy

3. Guangdong University of Petrochemical Technology, Maoming, China

Abstract

Video processing and analysis have become an urgent task, as a huge amount of videos (e.g., YouTube, Hulu) are uploaded online every day. The extraction of representative key frames from videos is important in video processing and analysis since it greatly reduces computing resources and time. Although great progress has been made recently, large-scale video classification remains an open problem, as the existing methods have not well balanced the performance and efficiency simultaneously. To tackle this problem, this work presents an unsupervised method to retrieve the key frames, which combines the convolutional neural network and temporal segment density peaks clustering. The proposed temporal segment density peaks clustering is a generic and powerful framework, and it has two advantages compared with previous works. One is that it can calculate the number of key frames automatically. The other is that it can preserve the temporal information of the video. Thus, it improves the efficiency of video classification. Furthermore, a long short-term memory network is added on the top of the convolutional neural network to further elevate the performance of classification. Moreover, a weight fusion strategy of different input networks is presented to boost performance. By optimizing both video classification and key frame extraction simultaneously, we achieve better classification performance and higher efficiency. We evaluate our method on two popular datasets (i.e., HMDB51 and UCF101), and the experimental results consistently demonstrate that our strategy achieves competitive performance and efficiency compared with the state-of-the-art approaches.

Funder

PRIN project PREVUE

EU H2020 project AI4Media

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture

Link

https://dl.acm.org/doi/pdf/10.1145/3571735

Reference77 articles.

1. Hakan Bilen, Basura Fernando, Efstratios Gavves, Andrea Vedaldi, and Stephen Gould. 2016. Dynamic image networks for action recognition. In Proceedings of CVPR.

2. Zhuowei Cai, Limin Wang, Xiaojiang Peng, and Yu Qiao. 2014. Multi-view super vector for action recognition. In Proceedings of CVPR.

3. Joao Carreira and Andrew Zisserman. 2017. Quo Vadis, action recognition? A new model and the kinetics dataset. In Proceedings of CVPR.

4. Information theory-based shot cut/fade detection and video summarization

5. Vasileios Choutas, Philippe Weinzaepfel, Jérôme Revaud, and Cordelia Schmid. 2018. Potion: Pose motion representation for action recognition. In Proceedings of CVPR.

Cited by 13 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Opposition-based optimized max pooled 3D convolutional features for action video retrieval;International Journal of Information Technology;2024-08-12

2. Pornographic video detection based on semantic and image enhancement;The Computer Journal;2024-07-27

3. Spatial-temporal multiscale feature optimization based two-stream convolutional neural network for action recognition;Cluster Computing;2024-06-01

4. Effective Video Summarization by Extracting Parameter-Free Motion Attention;ACM Transactions on Multimedia Computing, Communications, and Applications;2024-05-16

5. MDJ: A multi-scale difference joint keyframe extraction algorithm for infrared surveillance video action recognition;Digital Signal Processing;2024-05