Skeleton action recognition via graph convolutional network with self-attention module
-
Published:2024
Issue:4
Volume:32
Page:2848-2864
-
ISSN:2688-1594
-
Container-title:Electronic Research Archive
-
language:
-
Short-container-title:era
Author:
Li Min1, Chen Ke1, Bai Yunqing2, Pei Jihong2
Affiliation:
1. School of Mathematical Sciences, Shenzhen University, Shenzhen 518060, China 2. ATR National Key Laboratory of Defense Technology, Shenzhen University, Shenzhen 518060, China
Abstract
<abstract><p>Skeleton-based action recognition is an important but challenging task in the study of video understanding and human-computer interaction. However, existing methods suffer from two deficiencies. On the one hand, most methods usually involve manually designed convolution kernel which cannot capture spatial-temporal joint dependencies of complex regions. On the other hand, some methods just use the self-attention mechanism, ignoring its theoretical explanation. In this paper, we proposed a unified spatio-temporal graph convolutional network with a self-attention mechanism (SA-GCN) for low-quality motion video data with fixed viewing angle. SA-GCN can extract features efficiently by learning weights between joint points of different scales. Specifically, the proposed self-attention mechanism is end-to-end with mapping strategy for different nodes, which not only characterizes the multi-scale dependencies of joints, but also integrates the structural features of the graph and an ability of self-learning fusion features. Moreover, the attention mechanism proposed in this paper can be theoretically explained by GCN to some extent, which is usually not considered in most existing models. Extensive experiments on two widely used datasets, NTU-60 RGB+D and NTU-120 RGB+D, demonstrated that SA-GCN significantly outperforms a series of existing mainstream approaches in terms of accuracy.</p></abstract>
Publisher
American Institute of Mathematical Sciences (AIMS)
Reference54 articles.
1. M. Vrigkas, C. Nikou, I. A. Kakadiaris, A review of human activity recognition methods, Front. Rob. AI, 2 (2015), 28. https://doi.org/10.3389/frobt.2015.00028 2. Z. Sun, Q. Ke, H. Rahmani, M. Bennamoun, G. Wang, J. Liu, Human action recognition from various data modalities: A review, IEEE Trans. Pattern Anal. Mach. Intell., 45 (2022), 3200–3225. https://doi.org/10.1109/TPAMI.2022.3183112 3. W. Lin, M. T. Sun, R. Poovandran, Human activity recognition for video surveillance, in 2008 IEEE International Symposium on Circuits and Systems (ISCAS), IEEE, (2008), 2737–2740. https://doi.org/10.1109/ISCAS.2008.4542023 4. W. Hu, D. Xie, Z. Fu, W. Zeng, S. Maybank, Semantic-based surveillance video retrieval, IEEE Trans. Image Process., 16 (2007), 1168–1181. https://doi.org/10.1109/TIP.2006.891352 5. I. Rodomagoulakis, N. Kardaris, V. Pitsikalis, E. Mavroudi, A. Katsamanis, A. Tsiami, et al., Multimodal human action recognition in assistive human-robot interaction, in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, (2016), 2702–2706. https://doi.org/10.1109/ICASSP.2016.7472168
|
|