Enhancing Human Activity Recognition through Integrated Multimodal Analysis: A Focus on RGB Imaging, Skeletal Tracking, and Pose Estimation-Reference-Cited by-同舟云学术

Enhancing Human Activity Recognition through Integrated Multimodal Analysis: A Focus on RGB Imaging, Skeletal Tracking, and Pose Estimation

Published:2024-07-17 Issue:14 Volume:24 Page:4646
ISSN:1424-8220
Container-title:Sensors
language:en
Short-container-title:Sensors

Author:

Rehman Sajid Ur¹^ORCID,Yasin Aman Ullah¹,Ul Haq Ehtisham¹,Ali Moazzam¹^ORCID,Kim Jungsuk²³^ORCID,Mehmood Asif²^ORCID

Affiliation:

1. Department of Creative Technologies, Air University, Islamabad 44000, Pakistan

2. Department of Biomedical Engineering, College of IT Convergence, Gachon University, 1342 Seongnamdaero, Sujeong-gu, Seongnam-si 13120, Republic of Korea

3. Research and Development Laboratory, Cellico Company, Seongnam-si 13449, Republic of Korea

Abstract

Human activity recognition (HAR) is pivotal in advancing applications ranging from healthcare monitoring to interactive gaming. Traditional HAR systems, primarily relying on single data sources, face limitations in capturing the full spectrum of human activities. This study introduces a comprehensive approach to HAR by integrating two critical modalities: RGB imaging and advanced pose estimation features. Our methodology leverages the strengths of each modality to overcome the drawbacks of unimodal systems, providing a richer and more accurate representation of activities. We propose a two-stream network that processes skeletal and RGB data in parallel, enhanced by pose estimation techniques for refined feature extraction. The integration of these modalities is facilitated through advanced fusion algorithms, significantly improving recognition accuracy. Extensive experiments conducted on the UTD multimodal human action dataset (UTD MHAD) demonstrate that the proposed approach exceeds the performance of existing state-of-the-art algorithms, yielding improved outcomes. This study not only sets a new benchmark for HAR systems but also highlights the importance of feature engineering in capturing the complexity of human movements and the integration of optimal features. Our findings pave the way for more sophisticated, reliable, and applicable HAR systems in real-world scenarios.

Funder

National Research Foundation of Korea

Korea Institute of Industrial Technology Evaluation and Management

Publisher

MDPI AG

Link

https://www.mdpi.com/1424-8220/24/14/4646/pdf

Reference43 articles.

1. Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., and Blake, A. (2011, January 20–25). Real-time human pose recognition in parts from single depth images. Proceedings of the CVPR IEEE 2011, Colorado Springs, CO, USA.

2. Liu, F., Chen, J., Li, K., Tan, W., Cai, C., and Ayub, M.S. (2022). A Parallel Multi-Modal Factorized Bilinear Pooling Fusion Method Based on the Semi-Tensor Product for Emotion Recognition. Entropy, 24.

3. Zhao, J., Dong, W., Shi, L., Qiang, W., Kuang, Z., Xu, D., and An, T. (2022). Multimodal Feature Fusion Method for Unbalanced Sample Data in Social Network Public Opinion. Sensors, 22.

4. Dong, M., Fang, Z., Li, Y., Bi, S., and Chen, J. (2021). AR3D: Attention residual 3D network for human action recognition. Sensors, 21.

5. 3D convolutional neural networks for human action recognition;Ji;IEEE Trans. Pattern Anal. Mach. Intell.,2012