Enhancing Zero-Shot Action Recognition in Videos by Combining GANs with Text and Images-Reference-Cited by-同舟云学术

Enhancing Zero-Shot Action Recognition in Videos by Combining GANs with Text and Images

Published:2023-05-05 Issue:4 Volume:4 Page:
ISSN:2661-8907
Container-title:SN Computer Science
language:en
Short-container-title:SN COMPUT. SCI.

Author:

Huang Kaiqiang,Miralles-Pechuán Luis,Mckeever Susan

Abstract

AbstractZero-shot action recognition (ZSAR) tackles the problem of recognising actions that have not been seen by the model during the training phase. Various techniques have been used to achieve ZSAR in the field of human action recognition (HAR) in videos. Techniques based on generative adversarial networks (GANs) are the most promising in terms of performance. GANs are trained to generate representations of unseen videos conditioned on information related to the unseen classes, such as class label embeddings. In this paper, we present an approach based on combining information from two different GANs, both of which generate a visual representation of unseen classes. Our dual-GAN approach leverages two separate knowledge sources related to the unseen classes: class-label texts and images related to the class label obtained from Google Images. The generated visual embeddings of the unseen classes by the two GANs are merged and used to train a classifier in a supervised-learning fashion for ZSAR classification. Our methodology is based on the idea that using more and richer knowledge sources to generate unseen classes representations will lead to higher downstream accuracy when classifying unseen classes. The experimental results show that our dual-GAN approach outperforms state-of-the-art methods on the two benchmark HAR datasets: HMDB51 and UCF101. Additionally, we present a comprehensive discussion and analysis of the experimental results for both datasets to understand the nuances of each approach at a class level. Finally, we examine the impact of the number of visual embeddings generated by the two GANs on the accuracy of the models.

Funder

Fiosraigh Scholarship of Technological University Dublin

Technological University Dublin

Publisher

Springer Science and Business Media LLC

Subject

Computer Science Applications,Computer Networks and Communications,Computer Graphics and Computer-Aided Design,Computational Theory and Mathematics,Artificial Intelligence,General Computer Science

Link

https://link.springer.com/content/pdf/10.1007/s42979-023-01803-3.pdf

Reference45 articles.

1. Sahoo SP, Ari S, Mahapatra K, Mohanty SP. HAR-depth: a novel framework for human action recognition using sequential learning and depth estimated history images. In: IEEE transactions on emerging topics in computational intelligence. 2020.

2. Ponce H, Martínez-Villaseñor MDL, Miralles-Pechuán L. A novel wearable sensor-based human activity recognition approach using artificial hydrocarbon networks. Sensors. 2016;16(7):1033.

3. Wang H, Schmid C. Action recognition with improved trajectories. In: Proceedings of IEEE ICCV. 2013; p. 3551–3558.

4. Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L. Large-scale video classification with convolutional neural networks. In: Proceedings of IEEE CVPR. 2014; p. 1725–1732.

5. Simonyan K, Zisserman A. Two-stream convolutional networks for action recognition in videos. In: Advances in NIPS. 2014; p. 568–576.

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Domain-Adaptive and Context-Aware Fall Detection Based on Coarse-Fine Network Learning;International Journal of Innovative Science and Research Technology (IJISRT);2024-05-23

2. Generalized Zero-Shot Learning for Action Recognition Fusing Text and Image GANs;IEEE Access;2024

3. Zero-shot action recognition by clustered representation with redundancy-free features;Machine Vision and Applications;2023-10-09