Dual-attention Network for View-invariant Action Recognition-Reference-Cited by-同舟云学术

Dual-attention Network for View-invariant Action Recognition

Published:2023-07-20 Issue: Volume: Page:
ISSN:2199-4536
Container-title:Complex & Intelligent Systems
language:en
Short-container-title:Complex Intell. Syst.

Author:

Kumie Gedamu Alemu,Habtie Maregu Assefa,Ayall Tewodros Alemu^ORCID,Zhou Changjun,Liu Huawen,Seid Abegaz Mohammed,Erbad Aiman

Abstract

AbstractView-invariant action recognition has been widely researched in various applications, such as visual surveillance and human–robot interaction. However, view-invariant human action recognition is challenging due to the action occlusions and information loss caused by view changes. Modeling spatiotemporal dynamics of body joints and minimizing representation discrepancy between different views could be a valuable solution for view-invariant human action recognition. Therefore, we propose a Dual-Attention Network (DANet) aims to learn robust video representation for view-invariant action recognition. The DANet is composed of relation-aware spatiotemporal self-attention and spatiotemporal cross-attention modules. The relation-aware spatiotemporal self-attention module learns representative and discriminative action features. This module captures local and global long-range dependencies, as well as pairwise relations among human body parts and joints in the spatial and temporal domains. The cross-attention module learns view-invariant attention maps and generates discriminative features for semantic representations of actions in different views. We exhaustively evaluate our proposed approach on the NTU-60, NTU-120, and UESTC large-scale challenging datasets with multi-type evaluation metrics including Cross-Subject, Cross-View, Cross-Set, and Arbitrary-view. The experimental results demonstrate that our proposed approach significantly outperforms state-of-the-art approaches in view-invariant action recognition.

Funder

Postdoctoral Foundation of Zhejiang Normal University

Publisher

Springer Science and Business Media LLC

Subject

Computational Mathematics,Engineering (miscellaneous),Information Systems,Artificial Intelligence

Link

https://link.springer.com/content/pdf/10.1007/s40747-023-01171-8.pdf

Reference63 articles.

1. Ji Y, Zhan Y, Yang Y, Xu X, Shen F, Shen HT (2020) A context knowledge map guided coarse-to-fine action recognition. Trans Image Process 29:2742–2752. https://doi.org/10.1109/TIP.2019.2952088

2. Jun T, Baodi L, Wenhui G, Yanjiang W (2022) Two-stream temporal enhanced fisher vector encoding for skeleton-based action recognition. Complex Intell Syst. https://doi.org/10.1007/s40747-022-00914-3

3. Wang J, Nie X, Xia Y, Wu Y, Zhu S (2014) Cross-view action modeling, learning and recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2649–2656. https://doi.org/10.1109/CVPR.2014.339

4. Ji Y, Yang Y, Shen F, Shen HT, Zheng W (2018) A large-scale varying-view rgb-d action dataset for arbitrary-view human action recognition. In: ACM international conference on multimedia, pp 1510–1518. https://doi.org/10.1145/3240508.3240675

5. Liu J, Shah M, Kuipers B, Savarese S (2011) Cross-view action recognition via view knowledge transfer. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3209–3216. https://doi.org/10.1109/CVPR.2011.5995729

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. An efficient and lightweight multiperson activity recognition framework for robot-assisted healthcare applications;Expert Systems with Applications;2024-05