Multi-stage Factorized Spatio-Temporal Representation for RGB-D Action and Gesture Recognition-Reference-Cited by-同舟云学术

Multi-stage Factorized Spatio-Temporal Representation for RGB-D Action and Gesture Recognition

Published:2023-10-26 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the 31st ACM International Conference on Multimedia
language:
Short-container-title:

Author:

Ma Yujun¹^ORCID,Zhou Benjia²^ORCID,Wang Ruili¹^ORCID,Wang Pichao³^ORCID

Affiliation:

1. Dalian Maritime University & Massey University, Dalian, China

2. Macau University of Science and Technology, Macau SAR, China

3. Amazon Prime Video, Seattle, WA, USA

Funder

National Key Research and Development Plan

Guangdong Provincial Key R&D Programme

Science and Technology Development Fund of Macau Project

External cooperation key project of Chinese Academy Sciences

Publisher

ACM

Link

https://dl.acm.org/doi/pdf/10.1145/3581783.3612301

Reference18 articles.

1. XB Bruce , Yan Liu , and Keith CC Chan . 2021 . Multimodal fusion via teacher-student network for indoor action recognition . In Proceedings of the AAAI Conference on Artificial Intelligence , Vol. 35 . 3199--3207. XB Bruce, Yan Liu, and Keith CC Chan. 2021. Multimodal fusion via teacher-student network for indoor action recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 3199--3207.

2. XB Bruce , Yan Liu , Xiang Zhang , Sheng-hua Zhong, and Keith CC Chan . 2022 . Mmnet: A model-based multimodal network for human action recognition in rgb-d videos . IEEE Transactions on Pattern Analysis and Machine Intelligence ( 2022). XB Bruce, Yan Liu, Xiang Zhang, Sheng-hua Zhong, and Keith CC Chan. 2022. Mmnet: A model-based multimodal network for human action recognition in rgb-d videos. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022).

3. Cross-Modality Compensation Convolutional Neural Networks for RGB-D Action Recognition

4. Srijan Das , Saurav Sharma , Rui Dai , Francois Bremond , and Monique Thonnat . 2020 . Vpn: Learning video-pose embedding for activities of daily living. In Computer Vision-ECCV 2020: 16th European Conference , Glasgow, UK , August 23-28, 2020, Proceedings, Part IX 16. Springer , 72--90. Srijan Das, Saurav Sharma, Rui Dai, Francois Bremond, and Monique Thonnat. 2020. Vpn: Learning video-pose embedding for activities of daily living. In Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part IX 16. Springer, 72--90.

5. MTT: Multi-Scale Temporal Transformer for Skeleton-Based Action Recognition

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. k-NN attention-based video vision transformer for action recognition;Neurocomputing;2024-03

2. ConTrans-Detect: A Multi-Scale Convolution-Transformer Network for DeepFake Video Detection;2023 29th International Conference on Mechatronics and Machine Vision in Practice (M2VIP);2023-11-21