A Compact and Powerful Single-Stage Network for Multi-Person Pose Estimation
-
Published:2023-02-08
Issue:4
Volume:12
Page:857
-
ISSN:2079-9292
-
Container-title:Electronics
-
language:en
-
Short-container-title:Electronics
Author:
Xiao Yabo1, Wang Xiaojuan1, He Mingshu1ORCID, Jin Lei1ORCID, Song Mei1, Zhao Jian23
Affiliation:
1. School of Electronic Engineering, Beijing University of Posts and Telecommunications, No.10, Xitucheng Road, Haidian District, Beijing 100876, China 2. Institute of North Electronic Equipment, Beijing 100191, China 3. Department of Mathematics and Theories, Peng Cheng Laboratory, Shenzhen 518055, China
Abstract
Multi-person pose estimation generally follows top-down and bottom-up paradigms. The top-down paradigm detects all human boxes and then performs single-person pose estimation on each ROI. The bottom-up paradigm locates identity-free keypoints and then groups them into individuals. Both of them use an extra stage to build the relationship between human instance and corresponding keypoints (e.g., human detection in a top-down manner or a grouping process in a bottom-up manner). The extra stage leads to a high computation cost and a redundant two-stage pipeline. To address the above issue, we introduce a fine-grained body representation method. Concretely, the human body is divided into several local parts and each part is represented by an adaptive point. The novel body representation is able to sufficiently encode the diverse pose information and effectively model the relationship between human instance and corresponding keypoints in a single-forward pass. With the proposed body representation, we further introduce a compact single-stage multi-person pose regression network, called AdaptivePose++, which is the extended version of AAAI-22 paper AdaptivePose. During inference, our proposed network only needs a single-step decode operation to estimate the multi-person pose without complex post-processes and refinements. Without any bells and whistles, we achieve the most competitive performance on representative 2D pose estimation benchmarks MS COCO and CrowdPose in terms of accuracy and speed. In particular, AdaptivePose++ outperforms the state-of-the-art SWAHR-W48 and CenterGroup-W48 by 3.2 AP and 1.4 AP on COCO mini-val with faster inference speed. Furthermore, the outstanding performance on 3D pose estimation datasets MuCo-3DHP and MuPoTS-3D further demonstrates its effectiveness and generalizability on 3D scenes.
Funder
National Nature Fund Young Elite Scientist Sponsorship Program of China Association for Science and Technology
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering
Reference62 articles.
1. Xiao, Y., Wang, X.J., Yu, D., Wang, G., Zhang, Q., and He, M. (2023, January 7–14). AdaptivePose: Human Parts as Adaptive Points. Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA. 2. Luo, Z., Wang, Z., Huang, Y., Wang, L., Tan, T., and Zhou, E. (2021, January 19–25). Rethinking the heatmap regression for bottom-up human pose estimation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual. 3. Brasó, G., Kister, N., and Leal-Taixé, L. (2021, January 11–17). The Center of Attention: Center-Keypoint Grouping via Attention for Multi-Person Pose Estimation. Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual. 4. Papandreou, G., Zhu, T., and Kanazawa, N. (2017, January 22–29). Towards accurate multi-person pose estimation in the wild. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Venice, Italy. 5. Newell, A., Yang, K., and Deng, J. (2016, January 8–16). Stacked hourglass networks for human pose estimation. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherland.
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|