Author:
Koh Thean Chun,Yeo Chai Kiat,Jing Xuan,Sivadas Sunil
Abstract
Abstract
Given the prevalence of surveillance cameras in our daily lives, human action recognition from videos holds significant practical applications. A persistent challenge in this field is to develop more efficient models capable of real-time recognition with high accuracy for widespread implementation. In this research paper, we introduce a novel human action recognition model named Context-Aware Memory Attention Network (CAMA-Net), which eliminates the need for optical flow extraction and 3D convolution which are computationally intensive. By removing these components, CAMA-Net achieves superior efficiency compared to many existing approaches in terms of computation efficiency. A pivotal component of CAMA-Net is the Context-Aware Memory Attention Module, an attention module that computes the relevance score between key-value pairs obtained from the 2D ResNet backbone. This process establishes correspondences between video frames. To validate our method, we conduct experiments on four well-known action recognition datasets: ActivityNet, Diving48, HMDB51 and UCF101. The experimental results convincingly demonstrate the effectiveness of our proposed model, surpassing the performance of existing 2D-CNN based baseline models.
Article Highlights
Recent human action recognition models are not yet ready for practical applications due to high computation needs.
We propose a 2D CNN-based human action recognition method to reduce the computation load.
The proposed method achieves competitive performance compared to most SOTA 2D CNN-based methods on public datasets.
Funder
RIE2020 Industry Alignment Fund – Industry Collaboration Projects
Publisher
Springer Science and Business Media LLC
Subject
General Earth and Planetary Sciences,General Physics and Astronomy,General Engineering,General Environmental Science,General Materials Science,General Chemical Engineering
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献