Affiliation:
1. School of Electronic Information, Wuhan University, Wuhan 430072, China
2. Hubei Three Gorges Laboratory, Yichang 443007, China
Abstract
Human action recognition is a computer vision challenge that involves identifying and classifying human movements and activities. The behavior of humans comprises movements of multiple body parts, and Graph Convolutional Networks (GCNs) have emerged as a promising approach for this task. However, most contemporary GCN methods perform graph convolution on the entire skeleton graph without considering that the human body consists of distinct body parts. To address these shortcomings, we propose a novel method that optimizes the representation of the skeleton graph by designing temporal and spatial convolutional blocks while introducing the Part-wise Adaptive Topology Graph Convolution (PAT-GC) technique. PAT-GC adaptively learns the segmentation of different body parts and dynamically integrates the spatial relevance between them. Furthermore, we utilize hierarchical modeling to divide the skeleton graph, capturing a more comprehensive representation of the human body. We evaluate our approach on three publicly available large datasets: NTU RGB + D 60, NTU RGB + D 120, and Kinetics Skeleton 400. Our experimental results demonstrate that our approach achieves state-of-the-art performance, thus validating the efficiency of our proposed technique for human action recognition.
Funder
National Natural Science Foundation of China Enterprise Innovation and Development Joint Fund
Open and Innovation Fund of Hubei Three Gorges Laboratory
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering
Reference42 articles.
1. Simonyan, K., and Zisserman, A. (2014). Advances in Neural Information Processing Systems, MIT Press.
2. Feichtenhofer, C., Pinz, A., and Zisserman, A. (2016, January 27–30). Convolutional two-stream network fusion for video action recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.
3. Action-stage emphasized spatiotemporal VLAD for video action recognition;Tu;IEEE Trans. Image Process.,2019
4. Semantic cues enhanced multimodality multistream CNN for action recognition;Tu;IEEE Trans. Circuits Syst. Video Technol.,2018
5. Thakkar, K., and Narayanan, P. (2018). Part-based graph convolutional network for action recognition. arXiv.