Abstract
Due to the high variety of sign languages, it is essential to present a model that could recognize the hand gesture recognition. The state-of-art model is mainly driven by convolution neural networks (known as CNN), and researches are on optimizing CNN architectures. The CNN networks are too large and require long time to train. To address these challenges, we developed a more accurate and robust ECAPA-TDNN structure for recognition. The ECAPA-TDNN is a structure of multiple one- dimensional neural networks with one-dimensional convolution, activation layers, and batch normalization. On the challenging SHREC 2017 3D Shape Retrieval Contest dataset, the ECAPA-TDNN achieved an accuracy of 92.9%, which is 2% higher than the state-of-the-art accuracy achieved by CNNs.
Publisher
Darcy & Roy Press Co. Ltd.
Reference16 articles.
1. Y. Fang, K. Wang, J. Cheng, and H. Lu, “A real-time hand gesture recog- nition method,” in 2007 IEEE International Conference on Multimedia and Expo, 2007, pp. 995–998.
2. R. Vemulapalli, F. Arrate, and R. Chellappa, “Human action recognition by representing 3d skeletons as points in a lie group,” 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 588–595, 2014. [Online]. Available: https://api.semanticscholar.org/CorpusID: 1732632
3. L. Xia, C.-C. Chen, and J. K. Aggarwal, “View invariant human action recognition using histograms of 3d joints,” in 2012 IEEE Computer Soci- ety Conference on Computer Vision and Pattern Recognition Workshops, 2012, pp. 20–27.
4. Z. Z. L. Z. Li, W., “Action recognition based on a bag of 3d points,” in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops, 2010. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0262885616300592
5. Z. Liu, C. Zhang, and Y. Tian, “3d-based deep convolutional neural network for action recognition with depth sequences,” Image and Vision Computing, vol. 55, pp. 93–100, 2016, handcrafted vs. Learned Representations for Human Action Recognition.