Various frameworks for integrating image and video streams for spatiotemporal information learning employing 2D-3D Residual networks for human action recognition-Reference-Cited by-同舟云学术

Various frameworks for integrating image and video streams for spatiotemporal information learning employing 2D-3D Residual networks for human action recognition

Published:2023-08-28 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Yosry Shaimaa¹,elrefaei Lamiaa¹,Ziedan rania¹

Affiliation:

1. Benha University

Abstract

Abstract Human action recognition has been identified as an important research topic in computer vision because it is an essential form of communication and interplay between computers and humans. To assist computers in automatically recognizing human behaviors and accurately comprehending human intentions. Inspired by some keyframe extraction and multifeatured fusion research, this paper improved the accuracy of action recognition by utilizing keyframe features and fusing them with video features. In this article, we suggest a novel multi-stream approach architecture made up of two distinct models fused using different fusion techniques. The first model combines convolutional neural networks in two dimensions (2D-CNN) with Long-Short Term Memory (LSTM) networks to glean long-term spatial and temporal features from video keyframe images for human action recognition. The second model is a 3-dimensional convolutional neural network (3D-CNN) that gathers quick spatial-temporal features from video clips. Next, we use Early and Late Fusion techniques for the two different models to recognize human action from video. The HMDB-51 and UCF-101 datasets, two important action recognition benchmarks, were used to test our method. When applied to the HMDB-51 dataset and the UCF-101 dataset, the Early-Fusion (EF) strategy had an accuracy of 70.2% and 95.5%, respectively, while the Late-Fusion (LF) strategy had an accuracy of 77.2% and 97.5%, respectively.

Publisher

Research Square Platform LLC

Reference77 articles.

1. Vision-based analysis of small groups in pedestrian crowds,";Ge W;IEEE transactions on pattern analysis and machine intelligence,2012

2. Online anomaly detection in crowd scenes via structure analysis,";Yuan Y;IEEE transactions on cybernetics,2014

3. D. Ger{\'o}nimo and H. Kjellstr{\"o}m, "Unsupervised surveillance video retrieval based on human action and appearance," 2014 22nd International Conference on Pattern Recognition, pp. 4630–4635, 2014.

4. Data fusion and multiple classifier systems for human activity detection and health monitoring: Review and open research directions,";Nweke HF;Information Fusion,2019

5. Local transform features and hybridization for accurate face and human detection,";Jun B;IEEE transactions on pattern analysis and machine intelligence,2012