Affiliation:
1. School of Intelligence Science and Technology, University of Science and Technology Beijing, Beijing 100083, China
2. School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China
Abstract
In recent years, human pose estimation, as a subfield of computer vision and artificial intelligence, has achieved significant performance improvements due to its wide applications in human-computer interaction, virtual reality, and smart security. However, most existing methods are designed for single-person scenes and suffer from low accuracy and long inference time in multi-person scenes. To address this issue, increasing attention has been paid to developing methods for multi-person pose estimation, such as utilizing Partial Affinity Field (PAF)-based bottom-up methods to estimate 2D poses of multiple people. In this study, we propose a method that addresses the problems of low network accuracy and poor estimation of flexible joints. This method introduces the attention mechanism into the network and utilizes the joint point extraction method based on hard example mining. Integrating the attention mechanism into the network improves its overall performance. In contrast, the joint point extraction method improves the localization accuracy of the flexible joints of the network without increasing the complexity. Experimental results demonstrate that our proposed method significantly improves the accuracy of 2D human pose estimation. Our network achieved a notably elevated Average Precision (AP) score of 60.0 and outperformed competing methods on the standard benchmark COCO test dataset, signifying its exceptional performance.
Funder
National Natural Science Foundation of China
Subject
Management, Monitoring, Policy and Law,Renewable Energy, Sustainability and the Environment,Geography, Planning and Development,Building and Construction
Reference44 articles.
1. Attention-based multiview re-observation fusion network for skeletal action recognition;Fan;IEEE Trans. Multimed.,2018
2. Ouyang, W., Chu, X., and Wang, X. (2014, January 23–28). Multi-source deep learning for human pose estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.
3. Associative embedding: End-to-end learning for joint detection and grouping;Newell;Adv. Neural Inf. Process. Syst.,2017
4. Tompson, J., Goroshin, R., Jain, A., LeCun, Y., and Bregler, C. (2015, January 7–12). Efficient object localization using convolutional networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.
5. Multipath affinage stacked—Hourglass networks for human pose estimation;Hua;Front. Comput. Sci.,2020