Abstract
In recent years, great progress has been made in the recognition of skeletal behaviors based on graph convolutional networks (GCNs). In most existing methods, however, the fixed adjacency matrix and fixed graph structure are used for skeleton data feature extraction in the spatial dimension, which usually leads to weak spatial modeling ability, unsatisfactory generalization performance, and an excessive number of model parameters. Most of these methods follow the ST-GCN approach in the temporal dimension, which inevitably leads to a number of non-key frames, increasing the cost of feature extraction and causing the model to be slower in terms of feature extraction and the required computational burden. In this paper, a gated temporally and spatially adaptive graph convolutional network is proposed. On the one hand, a learnable parameter matrix which can adaptively learn the key information of the skeleton data in spatial dimension is added to the graph convolution layer, improving the feature extraction and generalizability of the model and reducing the number of parameters. On the other hand, a gated unit is added to the temporal feature extraction module to alleviate interference from non-critical frames and reduce computational complexity. A channel attention mechanism based on an SE module and a frame attention mechanism are used to enhance the model’s feature extraction ability. To prevent model degradation and ensure more stable training, residual links are added to each feature extraction module. The proposed approach was ultimately able to achieve 0.63% higher accuracy on the X-Sub benchmark with 4.46 M fewer parameters than GAT, one of the best SOTA methods. Inference speed of our model reaches as fast as 86.23 sequences/(second × GPU). Extensive experimental results further validate the effectiveness of our proposed approach on three large-scale datasets, namely, NTU RGB+D 60, NTU RGB+D 120, and Kinetics Skeleton.
Funder
Natural Science Foundation of Sichuan Province
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering
Reference42 articles.
1. A Survey on Visual Surveillance of Object Motion and Behaviors
2. Human activity analysis
3. Actional-structural graph convolutional networks for skeleton-based action recognition;Li;Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),2019
4. Interpretable 3d human action analysis with temporal convolutional networks;Kim;Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),2017
5. Multiview-Based 3-D Action Recognition Using Deep Networks
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献