Abstract
AbstractDynamic gesture recognition has become a new type of interaction to meet the needs of daily interaction. It is the most natural, easy to operate, and intuitive, so it has a wide range of applications. The accuracy of gesture recognition depends on the ability to accurately learn the short-term and long-term spatiotemporal features of gestures. Our work is different from improving the performance of a single type of network with convnets-based models and recurrent neural network-based models or serial stacking of two heterogeneous networks, we proposed a fusion architecture that can simultaneously learn short-term and long-term spatiotemporal features of gestures, which combined convnets-based models and recurrent neural network-based models in parallel. At each stage of feature learning, the short-term and long-term spatiotemporal features of gestures are captured simultaneously, and the contribution of two heterogeneous networks to the classification results in spatial and channel axes that can be learned automatically by using the attention mechanism. The sequence and pooling operation of the channel attention module and spatial attention module are compared through experiments. And the proportion of short-term and long-term features of gestures on channel and spatial axes in each stage of feature learning is quantitatively analyzed, and the final model is determined according to the experimental results. The module can be used for end-to-end learning and the proposed method was validated on the EgoGesture, SKIG, and IsoGD datasets and got very competitive performance.
Publisher
Springer Science and Business Media LLC
Subject
Computational Mathematics,Engineering (miscellaneous),Information Systems,Artificial Intelligence
Reference49 articles.
1. Lien J, Gillian N, Karagozler ME, Amihood P, Schwesig C, Olson E, Raja H, Poupyrev I (2016) Soli: ubiquitous gesture sensing with millimeter wave radar. ACM Trans Graph 35(4):1–19
2. Nymoen K, Haugen MR, Jensenius AR (2015) Mumyo–evaluating and exploring the myo armband for musical interaction. In: Proceedings of the international conference on new interfaces for musical expression
3. Parcheta Z, Martínez-Hinarejos C-D (2017) Sign language gesture recognition using HMM. In: Iberian conference on pattern recognition and image analysis. Springer, pp.419–426
4. Wieczorek M, Sika J, Wozniak M, Garg S, Hassan M (2021) Lightweight CNN model for human face detection in risk situations. IEEE Trans Ind Inf 18(7):4820–4829
5. Basak H, Kundu R, Singh PK, Ijaz MF, Woźniak M, Sarkar R (2022) A union of deep learning and swarm-based optimization for 3D human action recognition. Sci Rep 12(1):1–17
Cited by
9 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献