Effective Video Summarization Using Channel Attention-Assisted Encoder–Decoder Framework-Reference-Cited by-同舟云学术

Effective Video Summarization Using Channel Attention-Assisted Encoder–Decoder Framework

Published:2024-06-01 Issue:6 Volume:16 Page:680
ISSN:2073-8994
Container-title:Symmetry
language:en
Short-container-title:Symmetry

Author:

Alharbi Faisal¹,Habib Shabana²^ORCID,Albattah Waleed²^ORCID,Jan Zahoor³,Alanazi Meshari D.⁴^ORCID,Islam Muhammad⁵^ORCID

Affiliation:

1. Quantum Technologies and Advanced Computing Institute, King Abdulaziz City for Science and Technology, Riyadh 11442, Saudi Arabia

2. Department of Information Technology, College of Computer, Qassim University, Buraydah 51452, Saudi Arabia

3. Department of Computer Science, Islamia College Peshawar, Peshawar 25000, Pakistan

4. Department of Electrical Engineering, College of Engineering, Jouf University, Sakaka 72388, Saudi Arabia

5. Department of Electrical Engineering, College of Engineering, Qassim University, Buraydah 52571, Saudi Arabia

Abstract

A significant number of cameras regularly generate massive amounts of data, demanding hardware, time, and labor resources to acquire, process, and monitor. Asymmetric frames within videos pose a challenge to automatic summarization of videos, making it challenging to capture key content. Developments in computer vision have accelerated the seamless capture and analysis of high-resolution video content. Video summarization (VS) has garnered considerable interest due to its ability to provide concise summaries of lengthy videos. The current literature mainly relies on a reduced set of representative features implemented using shallow sequential networks. Therefore, this work utilizes an optimal feature-assisted visual intelligence framework for representative feature selection and summarization. Initially, the empirical analysis of several features is performed, and ultimately, we adopt a fine-tuning InceptionV3 backbone for feature extraction, deviating from conventional approaches. Secondly, our strategic encoder–decoder module captures complex relationships with five convolutional blocks and two convolution transpose blocks. Thirdly, we introduced a channel attention mechanism, illuminating interrelations between channels and prioritizing essential patterns to grasp complex refinement features for final summary generation. Additionally, comprehensive experiments and ablation studies validate our framework’s exceptional performance, consistently surpassing state-of-the-art networks on two benchmarks (TVSum and SumMe) datasets.

Publisher

MDPI AG

Link

https://www.mdpi.com/2073-8994/16/6/680/pdf

Reference82 articles.

1. Visualizing the hotspots and emerging trends of multimedia big data through scientometrics;Jin;Multimed. Tools Appl.,2019

2. Optimal volumetric video streaming with hybrid saliency based tiling;Li;IEEE Trans. Multimed.,2022

3. Digital video summarization techniques: A survey;Workie;Int. J. Eng. Technol.,2020

4. Khan, H., Huy, B.Q., Abidin, Z.U., Yoo, J., Lee, M., Seo, K.W., Hwang, D.Y., Lee, M.Y., and Suhr, J.K. (2023, January 20–23). A modified yolov4 network with medium-scale challenging benchmark for efficient animal detection. Proceedings of the 9th International Conference on Next Generation Computing, Danang, Vietnam.

5. Khan, H., Haq, I.U., Munsif, M., Khan, S.U., and Lee, M.Y. (2022). Automated wheat diseases classification framework using advanced machine learning technique. Agriculture, 12.