Human Action Representation Learning Using an Attention-Driven Residual 3DCNN Network-Reference-Cited by-同舟云学术

Human Action Representation Learning Using an Attention-Driven Residual 3DCNN Network

Published:2023-07-31 Issue:8 Volume:16 Page:369
ISSN:1999-4893
Container-title:Algorithms
language:en
Short-container-title:Algorithms

Author:

Ullah Hayat¹^ORCID,Munir Arslan¹^ORCID

Affiliation:

1. Department of Computer Science, Kansas State University, Manhattan, KS 66506, USA

Abstract

The recognition of human activities using vision-based techniques has become a crucial research field in video analytics. Over the last decade, there have been numerous advancements in deep learning algorithms aimed at accurately detecting complex human actions in video streams. While these algorithms have demonstrated impressive performance in activity recognition, they often exhibit a bias towards either model performance or computational efficiency. This biased trade-off between robustness and efficiency poses challenges when addressing complex human activity recognition problems. To address this issue, this paper presents a computationally efficient yet robust approach, exploiting saliency-aware spatial and temporal features for human action recognition in videos. To achieve effective representation of human actions, we propose an efficient approach called the dual-attentional Residual 3D Convolutional Neural Network (DA-R3DCNN). Our proposed method utilizes a unified channel-spatial attention mechanism, allowing it to efficiently extract significant human-centric features from video frames. By combining dual channel-spatial attention layers with residual 3D convolution layers, the network becomes more discerning in capturing spatial receptive fields containing objects within the feature maps. To assess the effectiveness and robustness of our proposed method, we have conducted extensive experiments on four well-established benchmark datasets for human action recognition. The quantitative results obtained validate the efficiency of our method, showcasing significant improvements in accuracy of up to 11% as compared to state-of-the-art human action recognition methods. Additionally, our evaluation of inference time reveals that the proposed method achieves up to a 74× improvement in frames per second (FPS) compared to existing approaches, thus showing the suitability and effectiveness of the proposed DA-R3DCNN for real-time human activity recognition.

Funder

Air Force Office of Scientific Research

Publisher

MDPI AG

Subject

Computational Mathematics,Computational Theory and Mathematics,Numerical Analysis,Theoretical Computer Science

Link

https://www.mdpi.com/1999-4893/16/8/369/pdf

Reference74 articles.

1. Mahmoud, A., Hu, J.S., and Waslander, S.L. (2023, January 3–7). Dense Voxel Fusion for 3D Object Detection. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA.

2. Muhammad, K., Ullah, H., Khan, S., Hijji, M., and Lloret, J. (IEEE Trans. Intell. Transp. Syst., 2022). Efficient Fire Segmentation for Internet-of-Things-Assisted Intelligent Transportation Systems, IEEE Trans. Intell. Transp. Syst., early access.

3. Vision-Based Semantic Segmentation in Scene Understanding for Autonomous Driving: Recent Achievements, Challenges, and Outlooks;Muhammad;IEEE Trans. Intell. Transp. Syst.,2022

4. Artificial Intelligence and Data Fusion at the Edge;Munir;IEEE Aerosp. Electron. Syst. Mag.,2021

5. FogSurv: A Fog-Assisted Architecture for Urban Surveillance Using Artificial Intelligence and Data Fusion;Munir;IEEE Access,2021

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A hybrid deep learning framework for daily living human activity recognition with cluster-based video summarization;Multimedia Tools and Applications;2024-04-18