Action Recognition Based on Multi-Level Topological Channel Attention of Human Skeleton
Author:
Hu Kai12ORCID, Shen Chaowen1ORCID, Wang Tianyan1ORCID, Shen Shuai1ORCID, Cai Chengxue1ORCID, Huang Huaming3ORCID, Xia Min12ORCID
Affiliation:
1. School of Automation, Nanjing University of Information Science and Technology, Nanjing 210044, China 2. CICAEET, Nanjing University of Information Science and Technology, Nanjing 210044, China 3. Department of Physical Education, Nanjing University of Information Science and Technology, Nanjing 210044, China
Abstract
In action recognition, obtaining skeleton data from human poses is valuable. This process can help eliminate negative effects of environmental noise, including changes in background and lighting conditions. Although GCN can learn unique action features, it fails to fully utilize the prior knowledge of human body structure and the coordination relations between limbs. To address these issues, this paper proposes a Multi-level Topological Channel Attention Network algorithm: Firstly, the Multi-level Topology and Channel Attention Module incorporates prior knowledge of human body structure using a coarse-to-fine approach, effectively extracting action features. Secondly, the Coordination Module utilizes contralateral and ipsilateral coordinated movements in human kinematics. Lastly, the Multi-scale Global Spatio-temporal Attention Module captures spatiotemporal features of different granularities and incorporates a causal convolution block and masked temporal attention to prevent non-causal relationships. This method achieved accuracy rates of 91.9% (Xsub), 96.3% (Xview), 88.5% (Xsub), and 90.3% (Xset) on NTU-RGB+D 60 and NTU-RGB+D 120, respectively.
Funder
National Natural Science Foundation of China
Subject
Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry
Reference65 articles.
1. Huang, D.A., Ramanathan, V., Mahajan, D., Torresani, L., Paluri, M., Fei-Fei, L., and Niebles, J.C. (2018, January 18–23). What makes a video a video: Analyzing temporal information in video understanding models and datasets. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA. 2. MacKenzie, I.S. (2012). Human-Computer Interaction: An Empirical Research Perspective, Newnes. 3. Burdea, G.C., and Coiffet, P. (2003). Virtual Reality Technology, John Wiley & Sons. 4. Feichtenhofer, C., Pinz, A., and Wildes, R.P. (2017, January 21–26). Spatiotemporal multiplier networks for video action recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA. 5. Yan, S., Xiong, Y., and Lin, D. (2018, January 2–7). Spatial temporal graph convolutional networks for skeleton-based action recognition. Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA.
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|