Two-stream temporal enhanced Fisher vector encoding for skeleton-based action recognition-Reference-Cited by-同舟云学术

Two-stream temporal enhanced Fisher vector encoding for skeleton-based action recognition

Published:2022-11-25 Issue: Volume: Page:
ISSN:2199-4536
Container-title:Complex & Intelligent Systems
language:en
Short-container-title:Complex Intell. Syst.

Author:

Tang Jun^ORCID,Liu Baodi^ORCID,Guo Wenhui^ORCID,Wang Yanjiang^ORCID

Abstract

AbstractThe key to skeleton-based action recognition is how to extract discriminative features from skeleton data. Recently, graph convolutional networks (GCNs) are proven to be highly successful for skeleton-based action recognition. However, existing GCN-based methods focus on extracting robust features while neglecting the information of feature distributions. In this work, we aim to introduce Fisher vector (FV) encoding into GCN to effectively utilize the information of feature distributions. However, since the Gaussian Mixture Model (GMM) is employed to fit the global distribution of features, Fisher vector encoding inevitably leads to losing temporal information of actions, which is demonstrated by our analysis. To tackle this problem, we propose a temporal enhanced Fisher vector encoding algorithm (TEFV) to provide more discriminative visual representation. Compared with FV, our TEFV model can not only preserve the temporal information of the entire action but also capture fine-grained spatial configurations and temporal dynamics. Moreover, we propose a two-stream framework (2sTEFV-GCN) by combining the TEFV model with the GCN model to further improve the performance. On two large-scale datasets for skeleton-based action recognition, NTU-RGB+D 60 and NTU-RGB+D 120, our model achieves state-of-the-art performance.

Funder

National Natural Science Foundation of China

Natural Science Foundation of Shandong Province

Fundamental Research Funds for the Central Universities

Publisher

Springer Science and Business Media LLC

Subject

Computational Mathematics,Engineering (miscellaneous),Information Systems,Artificial Intelligence

Link

https://link.springer.com/content/pdf/10.1007/s40747-022-00914-3.pdf

Reference55 articles.

1. Lin J, Gan C, Han S (2019) Tsm: Temporal shift module for efficient video understanding. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 7083–7093

2. Tran D, Wang H, Torresani L, et al (2018) A closer look at spatiotemporal convolutions for action recognition. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp 6450–6459

3. Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. Adv Neural Inform Process Syst 27:568–576

4. Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1933–1941

5. Wang L, Xiong Y, Wang Z et al (2018) Temporal segment networks for action recognition in videos. IEEE Trans Pattern Anal Mach Intell 41(11):2740–2755

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Dual-attention Network for View-invariant Action Recognition;Complex & Intelligent Systems;2023-07-20