Author:
Kumar Neelam Sanjeev,Deepika G.,Goutham V.,Buvaneswari B.,Reddy R. Vijaya Kumar,Angadi Sanjeevkumar,Dhanamjayulu C.,Chinthaginjala Ravikumar,Mohammad Faruq,Khan Baseem
Abstract
AbstractA comprehensive examination of human action recognition (HAR) methodologies situated at the convergence of deep learning and computer vision is the subject of this article. We examine the progression from handcrafted feature-based approaches to end-to-end learning, with a particular focus on the significance of large-scale datasets. By classifying research paradigms, such as temporal modelling and spatial features, our proposed taxonomy illuminates the merits and drawbacks of each. We specifically present HARNet, an architecture for Multi-Model Deep Learning that integrates recurrent and convolutional neural networks while utilizing attention mechanisms to improve accuracy and robustness. The VideoMAE v2 method (https://github.com/OpenGVLab/VideoMAEv2) has been utilized as a case study to illustrate practical implementations and obstacles. For researchers and practitioners interested in gaining a comprehensive understanding of the most recent advancements in HAR as they relate to computer vision and deep learning, this survey is an invaluable resource.
Publisher
Springer Science and Business Media LLC
Reference37 articles.
1. Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., & Van Gool, L. Temporal Segment networks: Towards good practices for deep action recognition. In European Conference on Computer Vision (ECCV) 20–36 (2016).
2. Simonyan, K., & Zisserman, A. Two-stream convolutional networks for action recognition in videos. In Advances in Neural Information Processing Systems (NeurIPS) 568–576 (2014).
3. Carreira, J., & Zisserman, A. Quo Vadis, action recognition? A new model and the kinetics dataset. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 4724–4733 (2017).
4. Feichtenhofer, C., Pinz, A., & Wildes, R. Spatiotemporal residual networks for video action recognition. In Advances in Neural Information Processing Systems (NeurIPS) 3431–3439 (2016).
5. Gupta, N. et al. Human activity recognition in artificial intelligence framework: A narrative review. Artif Intell Rev 55, 4755–4808 (2022).