Affiliation:
1. Nanjing University of Information Science and Technology, Nanjing, Jiangsu, China
2. Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Abstract
Temporal action proposal generation is an essential and challenging task in video understanding, which aims to locate the temporal intervals that likely contain the actions of interest. Although great progress has been made, the problem is still far from being well solved. In particular, prevalent methods can handle well only the local dependencies (i.e., short-term dependencies) among adjacent frames but are generally powerless in dealing with the global dependencies (i.e., long-term dependencies) between distant frames. To tackle this issue, we propose CLGNet, a novel Collaborative Local-Global Learning Network for temporal action proposal. The majority of CLGNet is an integration of Temporal Convolution Network and Bidirectional Long Short-Term Memory, in which Temporal Convolution Network is responsible for local dependencies while Bidirectional Long Short-Term Memory takes charge of handling the global dependencies. Furthermore, an attention mechanism called the
background suppression module
is designed to guide our model to focus more on the actions. Extensive experiments on two benchmark datasets, THUMOS’14 and ActivityNet-1.3, show that the proposed method can outperform state-of-the-art methods, demonstrating the strong capability of modeling the actions with varying temporal durations.
Funder
New Generation AI Major Project of Ministry of Science and Technology of China
Publisher
Association for Computing Machinery (ACM)
Subject
Artificial Intelligence,Theoretical Computer Science
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献