Effective Video Summarization Using Channel Attention-Assisted Encoder–Decoder Framework
Author:
Alharbi Faisal1, Habib Shabana2ORCID, Albattah Waleed2ORCID, Jan Zahoor3, Alanazi Meshari D.4ORCID, Islam Muhammad5ORCID
Affiliation:
1. Quantum Technologies and Advanced Computing Institute, King Abdulaziz City for Science and Technology, Riyadh 11442, Saudi Arabia 2. Department of Information Technology, College of Computer, Qassim University, Buraydah 51452, Saudi Arabia 3. Department of Computer Science, Islamia College Peshawar, Peshawar 25000, Pakistan 4. Department of Electrical Engineering, College of Engineering, Jouf University, Sakaka 72388, Saudi Arabia 5. Department of Electrical Engineering, College of Engineering, Qassim University, Buraydah 52571, Saudi Arabia
Abstract
A significant number of cameras regularly generate massive amounts of data, demanding hardware, time, and labor resources to acquire, process, and monitor. Asymmetric frames within videos pose a challenge to automatic summarization of videos, making it challenging to capture key content. Developments in computer vision have accelerated the seamless capture and analysis of high-resolution video content. Video summarization (VS) has garnered considerable interest due to its ability to provide concise summaries of lengthy videos. The current literature mainly relies on a reduced set of representative features implemented using shallow sequential networks. Therefore, this work utilizes an optimal feature-assisted visual intelligence framework for representative feature selection and summarization. Initially, the empirical analysis of several features is performed, and ultimately, we adopt a fine-tuning InceptionV3 backbone for feature extraction, deviating from conventional approaches. Secondly, our strategic encoder–decoder module captures complex relationships with five convolutional blocks and two convolution transpose blocks. Thirdly, we introduced a channel attention mechanism, illuminating interrelations between channels and prioritizing essential patterns to grasp complex refinement features for final summary generation. Additionally, comprehensive experiments and ablation studies validate our framework’s exceptional performance, consistently surpassing state-of-the-art networks on two benchmarks (TVSum and SumMe) datasets.
Reference82 articles.
1. Visualizing the hotspots and emerging trends of multimedia big data through scientometrics;Jin;Multimed. Tools Appl.,2019 2. Optimal volumetric video streaming with hybrid saliency based tiling;Li;IEEE Trans. Multimed.,2022 3. Digital video summarization techniques: A survey;Workie;Int. J. Eng. Technol.,2020 4. Khan, H., Huy, B.Q., Abidin, Z.U., Yoo, J., Lee, M., Seo, K.W., Hwang, D.Y., Lee, M.Y., and Suhr, J.K. (2023, January 20–23). A modified yolov4 network with medium-scale challenging benchmark for efficient animal detection. Proceedings of the 9th International Conference on Next Generation Computing, Danang, Vietnam. 5. Khan, H., Haq, I.U., Munsif, M., Khan, S.U., and Lee, M.Y. (2022). Automated wheat diseases classification framework using advanced machine learning technique. Agriculture, 12.
|
|