Deep Learning Based Human Activity Recognition Using Spatio-Temporal Image Formation of Skeleton Joints-Reference-Cited by-同舟云学术

Deep Learning Based Human Activity Recognition Using Spatio-Temporal Image Formation of Skeleton Joints

Published:2021-03-17 Issue:6 Volume:11 Page:2675
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Tasnim Nusrat,Islam Mohammad Khairul,Baek Joong-Hwan

Abstract

Human activity recognition has become a significant research trend in the fields of computer vision, image processing, and human–machine or human–object interaction due to cost-effectiveness, time management, rehabilitation, and the pandemic of diseases. Over the past years, several methods published for human action recognition using RGB (red, green, and blue), depth, and skeleton datasets. Most of the methods introduced for action classification using skeleton datasets are constrained in some perspectives including features representation, complexity, and performance. However, there is still a challenging problem of providing an effective and efficient method for human action discrimination using a 3D skeleton dataset. There is a lot of room to map the 3D skeleton joint coordinates into spatio-temporal formats to reduce the complexity of the system, to provide a more accurate system to recognize human behaviors, and to improve the overall performance. In this paper, we suggest a spatio-temporal image formation (STIF) technique of 3D skeleton joints by capturing spatial information and temporal changes for action discrimination. We conduct transfer learning (pretrained models- MobileNetV2, DenseNet121, and ResNet18 trained with ImageNet dataset) to extract discriminative features and evaluate the proposed method with several fusion techniques. We mainly investigate the effect of three fusion methods such as element-wise average, multiplication, and maximization on the performance variation to human action recognition. Our deep learning-based method outperforms prior works using UTD-MHAD (University of Texas at Dallas multi-modal human action dataset) and MSR-Action3D (Microsoft action 3D), publicly available benchmark 3D skeleton datasets with STIF representation. We attain accuracies of approximately 98.93%, 99.65%, and 98.80% for UTD-MHAD and 96.00%, 98.75%, and 97.08% for MSR-Action3D skeleton datasets using MobileNetV2, DenseNet121, and ResNet18, respectively.

Funder

Gyeonggi province, Korea

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/11/6/2675/pdf

Reference68 articles.

1. Continuous Human Action Recognition Using Depth-MHI-HOG and a Spotter Model

2. Continuous detection and recognition of actions of interest among actions of non-interest using a depth camera

3. Structured Feature Learning for Pose Estimation

4. Semantic human activity recognition: A literature review

5. A Vision-Based System for Intelligent Monitoring: Human Behaviour Analysis and Privacy by Context

Cited by 38 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Video Surveillance System-Based Human Activity Recognition Using Hierarchical Auto-Associative Polynomial Convolutional Neural Network with Garra Rufa Fish Optimization;International Journal of Pattern Recognition and Artificial Intelligence;2024-07-25

2. Hybrid semantics-based vulnerability detection incorporating a Temporal Convolutional Network and Self-attention Mechanism;Information and Software Technology;2024-07

3. Human Activity Recognition Through Images Using a Deep Learning Approach;2024-05-30

4. A hybrid deep learning framework for daily living human activity recognition with cluster-based video summarization;Multimedia Tools and Applications;2024-04-18

5. PAR-Net: An Enhanced Dual-Stream CNN–ESN Architecture for Human Physical Activity Recognition;Sensors;2024-03-16