Efficient Sampling of Two-Stage Multi-Person Pose Estimation and Tracking from Spatiotemporal-Reference-Cited by-同舟云学术

Efficient Sampling of Two-Stage Multi-Person Pose Estimation and Tracking from Spatiotemporal

Published:2024-03-07 Issue:6 Volume:14 Page:2238
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Lin Song¹,Hou Wenjun²³

Affiliation:

1. School of Automation, Beijing University of Posts and Telecommunications, Beijing 100876, China

2. Beijing Key Laboratory of Network Systems and Network Culture, Beijing University of Posts and Telecommunications, Beijing 100876, China

3. School of Digital Media and Design Arts, Beijing University of Posts and Telecommunications, Beijing 100876, China

Abstract

Tracking the articulated poses of multiple individuals in complex videos is a highly challenging task due to a variety of factors that compromise the accuracy of estimation and tracking. Existing frameworks often rely on intricate propagation strategies and extensive exchange of flow data between video frames. In this context, we propose a spatiotemporal sampling framework that addresses the degradation of frames at the feature level, offering a simple yet effective network block. Our spatiotemporal sampling mechanism empowers the framework to extract meaningful features from neighboring video frames, thereby optimizing the accuracy of pose detection in the current frame. This approach results in significant improvements in running latency. When evaluated on the COCO dataset and the mixed dataset, our approach outperforms other methods in terms of average precision (AP), recall rate (AR), and acceleration ratio. Specifically, we achieve a 3.7% increase in AP, a 1.77% increase in AR, and a speedup of 1.51 times compared to mainstream state-of-the-art (SOTA) methods. Furthermore, when evaluated on the PoseTrack2018 dataset, our approach demonstrates superior accuracy in multi-object tracking, as measured by the multi-object tracking accuracy (MOTA) metric. Our method achieves an impressive 11.7% increase in MOTA compared to the prevailing SOTA methods.

Publisher

MDPI AG

Link

https://www.mdpi.com/2076-3417/14/6/2238/pdf

Reference44 articles.

1. Zhou, L., Meng, X., Liu, Z., Wu, M., Gao, Z., and Wang, P. (2023). Human Pose-based Estimation, Tracking and Action Recognition with Deep Learning: A Survey. arXiv.

2. Doering, A., Chen, D., Zhang, S., Schiele, B., and Gall, J. (2022, January 18–24). PoseTrack21: A Dataset for Person Search, Multi-Object Tracking and Multi-Person Pose Tracking. Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.

3. 2D Human pose estimation: A survey;Chen;Multimed. Syst.,2023

4. Insafutdinov, E., Andriluka, M., Pishchulin, L., Tang, S., Levinkov, E., Andres, B., and Schiele, B. (2017, January 21–26). ArtTrack: Articulated multi-person tracking in the wild. Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.

5. Andriluka, M., Iqbal, U., Insafutdinov, E., Pishchulin, L., Milan, A., Gall, J., and Schiele, B. (2018, January 18–22). PoseTrack: A Benchmark for Human Pose Estimation and Tracking. Proceedings of the 31st Meeting of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, Salt Lake City, UT, USA.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Image Detection Network Based on Enhanced Small Target Recognition Details and Its Application in Fine Granularity;Applied Sciences;2024-06-04