Enhancing Human Activity Recognition through Integrated Multimodal Analysis: A Focus on RGB Imaging, Skeletal Tracking, and Pose Estimation
Author:
Rehman Sajid Ur1ORCID, Yasin Aman Ullah1, Ul Haq Ehtisham1, Ali Moazzam1ORCID, Kim Jungsuk23ORCID, Mehmood Asif2ORCID
Affiliation:
1. Department of Creative Technologies, Air University, Islamabad 44000, Pakistan 2. Department of Biomedical Engineering, College of IT Convergence, Gachon University, 1342 Seongnamdaero, Sujeong-gu, Seongnam-si 13120, Republic of Korea 3. Research and Development Laboratory, Cellico Company, Seongnam-si 13449, Republic of Korea
Abstract
Human activity recognition (HAR) is pivotal in advancing applications ranging from healthcare monitoring to interactive gaming. Traditional HAR systems, primarily relying on single data sources, face limitations in capturing the full spectrum of human activities. This study introduces a comprehensive approach to HAR by integrating two critical modalities: RGB imaging and advanced pose estimation features. Our methodology leverages the strengths of each modality to overcome the drawbacks of unimodal systems, providing a richer and more accurate representation of activities. We propose a two-stream network that processes skeletal and RGB data in parallel, enhanced by pose estimation techniques for refined feature extraction. The integration of these modalities is facilitated through advanced fusion algorithms, significantly improving recognition accuracy. Extensive experiments conducted on the UTD multimodal human action dataset (UTD MHAD) demonstrate that the proposed approach exceeds the performance of existing state-of-the-art algorithms, yielding improved outcomes. This study not only sets a new benchmark for HAR systems but also highlights the importance of feature engineering in capturing the complexity of human movements and the integration of optimal features. Our findings pave the way for more sophisticated, reliable, and applicable HAR systems in real-world scenarios.
Funder
National Research Foundation of Korea Korea Institute of Industrial Technology Evaluation and Management
Reference43 articles.
1. Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., and Blake, A. (2011, January 20–25). Real-time human pose recognition in parts from single depth images. Proceedings of the CVPR IEEE 2011, Colorado Springs, CO, USA. 2. Liu, F., Chen, J., Li, K., Tan, W., Cai, C., and Ayub, M.S. (2022). A Parallel Multi-Modal Factorized Bilinear Pooling Fusion Method Based on the Semi-Tensor Product for Emotion Recognition. Entropy, 24. 3. Zhao, J., Dong, W., Shi, L., Qiang, W., Kuang, Z., Xu, D., and An, T. (2022). Multimodal Feature Fusion Method for Unbalanced Sample Data in Social Network Public Opinion. Sensors, 22. 4. Dong, M., Fang, Z., Li, Y., Bi, S., and Chen, J. (2021). AR3D: Attention residual 3D network for human action recognition. Sensors, 21. 5. 3D convolutional neural networks for human action recognition;Ji;IEEE Trans. Pattern Anal. Mach. Intell.,2012
|
|