Relational Action Bank with Semantic–Visual Attention for Few-Shot Action Recognition-Reference-Cited by-同舟云学术

Relational Action Bank with Semantic–Visual Attention for Few-Shot Action Recognition

Published:2023-03-03 Issue:3 Volume:15 Page:101
ISSN:1999-5903
Container-title:Future Internet
language:en
Short-container-title:Future Internet

Author:

Liang Haoming¹^ORCID,Du Jinze¹,Zhang Hongchen¹,Han Bing¹,Ma Yan²

Affiliation:

1. School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China

2. State Key Laboratory of Software Development Environment, School of Computer Science and Engineering, Beihang University, Beijing 100191, China

Abstract

Recently, few-shot learning has attracted significant attention in the field of video action recognition, owing to its data-efficient learning paradigm. Despite the encouraging progress, identifying ways to further improve the few-shot learning performance by exploring additional or auxiliary information for video action recognition remains an ongoing challenge. To address this problem, in this paper we make the first attempt to propose a relational action bank with semantic–visual attention for few-shot action recognition. Specifically, we introduce a relational action bank as the auxiliary library to assist the network in understanding the actions in novel classes. Meanwhile, the semantic–visual attention is devised to adaptively capture the connections to the foregone actions via both semantic correlation and visual similarity. We extensively evaluate our approach via two backbone models (ResNet-50 and C3D) on HMDB and Kinetics datasets, and demonstrate that the proposed model can obtain significantly better performance compared against state-of-the-art methods. Notably, our results demonstrate an average improvement of about 6.2% when compared to the second-best method on the Kinetics dataset.

Funder

Science and Technology Planning Project of Gansu Province, China

Publisher

MDPI AG

Subject

Computer Networks and Communications

Link

https://www.mdpi.com/1999-5903/15/3/101/pdf

Reference87 articles.

1. Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., and Fei-Fei, L. (2014, January 23–28). Large-scale Video Classification with Convolutional Neural Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.

2. Carreira, J., Noland, E., Hillier, C., and Zisserman, A. (2019). A short note on the kinetics-700 human action dataset. arXiv.

3. Koch, G., Zemel, R., and Salakhutdinov, R. (2015, January 6–11). Siamese neural networks for one-shot image recognition. Proceedings of the ICML Deep Learning Workshop, Lille, France.

4. Vinyals, O., Blundell, C., Lillicrap, T., and Wierstra, D. (2016, January 5–10). Matching networks for one shot learning. Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain.

5. Snell, J., Swersky, K., and Zemel, R. (2017, January 4–9). Prototypical networks for few-shot learning. Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA.