Multimodal vision-based human action recognition using deep learning: a review-Reference-Cited by-同舟云学术

Multimodal vision-based human action recognition using deep learning: a review

Published:2024-06-19 Issue:7 Volume:57 Page:
ISSN:1573-7462
Container-title:Artificial Intelligence Review
language:en
Short-container-title:Artif Intell Rev

Author:

Shafizadegan Fatemeh,Naghsh-Nilchi Ahmad R.,Shabaninia Elham

Abstract

AbstractVision-based Human Action Recognition (HAR) is a hot topic in computer vision. Recently, deep-based HAR has shown promising results. HAR using a single data modality is a common approach; however, the fusion of different data sources essentially conveys complementary information and improves the results. This paper comprehensively reviews deep-based HAR methods using multiple visual data modalities. The main contribution of this paper is categorizing existing methods into four levels, which provides an in-depth and comparable analysis of approaches in various aspects. So, at the first level, proposed methods are categorized based on the employed modalities. At the second level, methods categorized in the first level are classified based on the employment of complete modalities or working with missing modalities at the test time. At the third level, complete and missing modality branches are categorized based on existing approaches. Finally, similar frameworks in the third category are grouped together. In addition, a comprehensive comparison is provided for publicly available benchmark datasets, which helps to compare and choose suitable datasets for a task or to develop new datasets. This paper also compares the performance of state-of-the-art methods on benchmark datasets. The review concludes by highlighting several future directions.

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s10462-024-10730-5.pdf

Reference302 articles.

1. Adewopo V, Elsayed N, ElSayed Z, etal (2022) Review on action recognition for accident detection in smart city transportation systems. arXiv preprint arXiv:2208.09588https://doi.org/10.48550/arXiv.2208.09588

2. Adhikari K, Bouchachia H, Nait-Charif H (2017) Activity recognition for indoor fall detection using convolutional neural network. In: 2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA), IEEE, pp 81–84, https://doi.org/10.23919/mva.2017.7986795

3. Ahmad T, Jin L, Zhang X et al (2021) Graph convolutional neural network for human action recognition: a comprehensive survey. IEEE Trans Artif Intell 2(2):128–145. https://doi.org/10.1109/tai.2021.3076974

4. Ahn D, Kim S, Ko BC (2023) Star++: rethinking spatio-temporal cross attention transformer for video action recognition. Appl Intell. https://doi.org/10.1007/s10489-023-04978-7

5. Akkaladevi SC, Heindl C (2015) Action recognition for human robot interaction in industrial applications. In: 2015 IEEE International Conference on Computer Graphics Vision and Information Security (CGVIS). IEEE, pp 94–99, https://doi.org/10.1109/cgvis.2015.7449900