An optimization method of human skeleton keyframes selection for action recognition-Reference-Cited by-同舟云学术

An optimization method of human skeleton keyframes selection for action recognition

Published:2024-03-30 Issue:4 Volume:10 Page:4659-4673
ISSN:2199-4536
Container-title:Complex & Intelligent Systems
language:en
Short-container-title:Complex Intell. Syst.

Author:

Chen Hao^ORCID,Pan Yuekai,Wang Chenwu

Abstract

AbstractIn the action recognition field based on the characteristics of human skeleton joint points, the selection of keyframes in the skeleton sequence is a significant issue, which directly affects the action recognition accuracy. In order to improve the effectiveness of keyframes selection, this paper proposes inflection point frames, and transforms keyframes selection into a multi-objective optimization problem based on it. First, the pose features are extracted from the input skeleton joint point data, which used to construct the pose feature vector of each frame in time sequence; then, the inflection point frames in the sequence are determined according to the flow of momentum of each body part. Next, the pose feature vectors are input into the keyframes multi-objective optimization model, with the fusion of domain information and the number of keyframes; finally, the output keyframes are input to the action classifier. To verify the effectiveness of the method, the MSR-Action3D, the UTKinect-Action and Florence3D-Action, and the 3 public datasets, are chosen for simulation experiments and the results show that the keyframes sequence obtained by this method can significantly improve the accuracy of multiple action classifiers, and the average recognition accuracy of the three data sets can reach 94.6%, 97.6% and 94.2% respectively. Besides, combining the optimized keyframes with deep learning classifier on the NTU RGB + D dataset can make the accuracies reaching 83.2% and 93.7%.

Funder

National Natural Science Foundation of China

Key Research and Development Program of Jiangxi Province

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s40747-024-01403-5.pdf

Reference47 articles.

1. Dang LM et al (2020) Sensor-based and vision-based human activity recognition: a comprehensive survey. Pattern Recogn 108:107561

2. Elias P, Sedmidubsky J, Zezula P (2019) Understanding the gap between 2D and 3D skeleton-based action recognition. In: 2019 IEEE International Symposium on Multimedia (ISM), pp 192–195

3. Xuan TN, Ngo TD, Le TH (2019) A Spatial-temporal 3D human pose reconstruction framework. J Inform Process Syst 15(2):399–409

4. Lillo I, Soto A, Niebles JC (2014) Discriminative hierarchical modeling of spatio-temporally composable human activities. In: Proceedings of the IEEE International Conference on computer vision, p 812–819

5. Bobick AF, Davis JW (2001) The recognition of human movement using temporal templates. IEEE Trans Pattern Anal Mach Intell 23(3):257–267