AR3D: Attention Residual 3D Network for Human Action Recognition-Reference-Cited by-同舟云学术

AR3D: Attention Residual 3D Network for Human Action Recognition

Published:2021-02-28 Issue:5 Volume:21 Page:1656
ISSN:1424-8220
Container-title:Sensors
language:en
Short-container-title:Sensors

Author:

Dong Min,Fang Zhenglin,Li Yongfa,Bi Sheng^ORCID,Chen Jiangcheng

Abstract

At present, in the field of video-based human action recognition, deep neural networks are mainly divided into two branches: the 2D convolutional neural network (CNN) and 3D CNN. However, 2D CNN’s temporal and spatial feature extraction processes are independent of each other, which means that it is easy to ignore the internal connection, affecting the performance of recognition. Although 3D CNN can extract the temporal and spatial features of the video sequence at the same time, the parameters of the 3D model increase exponentially, resulting in the model being difficult to train and transfer. To solve this problem, this article is based on 3D CNN combined with a residual structure and attention mechanism to improve the existing 3D CNN model, and we propose two types of human action recognition models (the Residual 3D Network (R3D) and Attention Residual 3D Network (AR3D)). Firstly, in this article, we propose a shallow feature extraction module and improve the ordinary 3D residual structure, which reduces the parameters and strengthens the extraction of temporal features. Secondly, we explore the application of the attention mechanism in human action recognition and design a 3D spatio-temporal attention mechanism module to strengthen the extraction of global features of human action. Finally, in order to make full use of the residual structure and attention mechanism, an Attention Residual 3D Network (AR3D) is proposed, and its two fusion strategies and corresponding model structure (AR3D_V1, AR3D_V2) are introduced in detail. Experiments show that the fused structure shows different degrees of performance improvement compared to a single structure.

Funder

National Natural Science Foundation of China

Guang dong province science and technology plan projects

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry

Link

https://www.mdpi.com/1424-8220/21/5/1656/pdf

Reference25 articles.

1. Two-stream convolutional networks for action recognition in videos;Simonyan,2014

Cited by 24 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Enhancing Human Activity Recognition through Integrated Multimodal Analysis: A Focus on RGB Imaging, Skeletal Tracking, and Pose Estimation;Sensors;2024-07-17

2. Action recognition method based on a novel keyframe extraction method and enhanced 3D convolutional neural network;International Journal of Machine Learning and Cybernetics;2024-06-11

3. Multi-scale perceptual YOLO for automatic detection of clue cells and trichomonas in fluorescence microscopic images;Computers in Biology and Medicine;2024-06

4. Human Action Recognition Utilizing Doppler-Enhanced Convolutional 3D Networks;2024 IEEE International Conference on Big Data and Smart Computing (BigComp);2024-02-18

5. Solving traffic data occlusion problems in computer vision algorithms using DeepSORT and quantum computing;Journal of Traffic and Transportation Engineering (English Edition);2024-02