Reconstructed Prototype Network Combined with CDC-TAGCN for Few-Shot Action Recognition-Reference-Cited by-同舟云学术

Reconstructed Prototype Network Combined with CDC-TAGCN for Few-Shot Action Recognition

Published:2023-10-12 Issue:20 Volume:13 Page:11199
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Wu Aihua¹,Ding Songyu¹

Affiliation:

1. School of Information Engineering, Shanghai Maritime University, Shanghai 201306, China

Abstract

Research on few-shot action recognition has received widespread attention recently. However, there are some blind spots in the current research: (1) The prevailing practice in many models is to assign uniform weights to all samples; nevertheless, such an approach may yield detrimental consequences for the model in the presence of high-noise samples. (2) Samples with similar features but different classes make it difficult for the model to be distinguished. (3) Skeleton data harbors rich temporal features, but most encoders face challenges in effectively extracting them. In response to these challenges, this study introduces a reconstructed prototype network (RC-PN) based on a prototype network framework and a novel spatiotemporal encoder. The RC-PN comprises two enhanced modules: Sample coefficient reconstruction (SCR) and a reconstruction loss function (LRC). SCR leverages cosine similarity between samples to reassign sample weights, thereby generating prototypes robust to noise interference and more adept at conveying conceptual essence. Simultaneously, the introduction of LRC enhances the feature similarity among samples of the same class while increasing feature distinctiveness between different classes. In the encoder aspect, this study introduces a novel spatiotemporal convolutional encoder called CDC-TAGCN. The temporal convolution operator is redefined in CDC-TAGCN. The vanilla temporal convolution operator can only capture the surface-level characteristics of action samples. Drawing inspiration from differential convolution (CDC), this research enhances TCN to CDC-TGCN. CDC-TGCN allows for the fusion of discrepant features from action samples into the features extracted by the vanilla convolutional operator. Extensive feasibility and ablation experiments are performed on the skeleton action dataset NTU-RGB + D 120 and Kinetics and compared with recent research.

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/13/20/11199/pdf

Reference29 articles.

1. Li, F., Fergus, and Perona (2003). Proceedings of the Ninth IEEE International Conference on Computer Vision, IEEE.

2. Ma, N., Zhang, H., Li, X., Zhou, S., Zhang, Z., Wen, J., Li, H., Gu, J., and Bu, J. (2022, January 23–27). Learning spatial-preserved skeleton representations for few-shot action recognition. Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel.

3. Cuturi, M., and Blondel, M. (2017, January 6–11). Soft-dtw: A differentiable loss function for time-series. Proceedings of the International Conference on Machine Learning (PMLR), Sydney, Australia.

4. Du, Y., Wang, W., and Wang, L. (2015, January 7–12). Hierarchical recurrent neural network for skeleton based action recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.

5. Byeon, Y.-H., Kim, D., Lee, J., and Kwak, K.-C. (2021). Body and hand–object ROI-based behavior recognition using deep learning. Sensors, 21.