Affiliation:
1. Faculty of Applied Sciences, Macao Polytechnic University, Macau, China
2. Engineering Research Centre of Applied Technology on Machine Translation and Artificial Intelligence, Macao Polytechnic University, Macau, China
Abstract
Nowadays, video is a common social media in our lives. Video summarisation has become an interesting task for information extraction, where the challenge of high redundancy of key scenes leads to difficulties in retrieving important messages. To address this challenge, this work presents a novel approach called the Graph Attention (GAT)-based bi-directional content-adaptive recurrent unit model for video summarisation. The model makes use of the graph attention approach to transform the visual features of interesting scene(s) from a video. This transformation is achieved by a mechanism called Adaptive Feature-based Transformation (AFT), which extracts the visual features and elevates them to a higher-level representation. We also introduce a new GAT-based attention model that extracts major features from weight features for information extraction, taking into account the tendency of humans to pay attention to transformations and moving objects. Additionally, we integrate the higher-level visual features obtained from the attention layer with the semantic features processed by Bi-CARU. By combining both visual and semantic information, the proposed work enhances the accuracy of key-scene determination. By addressing the issue of high redundancy among major information and using advanced techniques, our method provides a competitive and efficient way to summarise videos. Experimental results show that our approach outperforms existing state-of-the-art methods in video summarisation.
Funder
Macao Polytechnic University