AiPE: A Novel Transformer-Based Pose Estimation Method
-
Published:2024-03-02
Issue:5
Volume:13
Page:967
-
ISSN:2079-9292
-
Container-title:Electronics
-
language:en
-
Short-container-title:Electronics
Affiliation:
1. Department of Computer Science and Engineering, College of Engineering, Konkuk University, Seoul 05029, Republic of Korea
Abstract
Human pose estimation is an important problem in computer vision because it is the foundation for many advanced semantic tasks and downstream applications. Although some convolutional neural network-based pose estimation methods have achieved good results, these networks are still limited for restricted receptive fields and weak robustness, leading to poor detection performance in scenarios with blur or low resolution. Additionally, their highly parallelized strategy is likely to cause significant computational demands, requiring high computing power. In comparison to the convolutional neural networks, the transformer-based methods offer advantages such as flexible stacking, global perspective, and parallel computation. Based on the great benefits, a novel transformer-based human pose estimation method is developed, which employees multi-head self-attention mechanisms and offset windows to effectively suppress the quick growth of the computational complexity near human keypoints. Experimental results under detailed visual comparison and quantitative analysis demonstrate that the proposed method can efficiently deal with the pose estimation problem in challenging scenarios, such as blurry or occluded scenes. Furthermore, the errors in human skeleton mapping caused by keypoint occlusion or omission can be effectively corrected, so the accuracy of pose estimation results is greatly improved.
Funder
Korea government (Ministry of Science and ICT Ministry of Education Korea Government
Reference33 articles.
1. Human pose estimation and its application to action recognition: A survey;Song;J. Vis. Commun. Image Represent.,2021 2. Solomon, E., and Cios, K.J. (2023). FASS: Face anti-spoofing system using image quality features and deep learning. Electronics, 12. 3. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv. 4. Mao, W., Ge, Y., Shen, C., Tian, Z., Wang, X., and Wang, Z. (2021). Tfpose: Direct human pose estimation with transformers. arXiv. 5. Wang, Z., Yu, Z., Zhao, C., Zhu, X., Qin, Y., Zhou, Q., Zhou, F., and Lei, Z. (2020, January 14–19). Deep spatial gradient and temporal depth learning for face anti-spoofing. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.
|
|