Multi-view 3D human pose estimation based on multi-scale feature by orthogonal projection-Reference-Cited by-同舟云学术

Multi-view 3D human pose estimation based on multi-scale feature by orthogonal projection

Published:2024 Issue: Volume:522 Page:01043
ISSN:2267-1242
Container-title:E3S Web of Conferences
language:
Short-container-title:E3S Web Conf.

Author:

Wang Yinghan,Dong Jianmin,Wang Yanan,Sun Bingyang

Abstract

Aiming at the problems of inaccurate estimation results, complicated matching of feature information in different views and poor robustness of the network model in complex scenes, a multi-view multi-person 3D human pose estimation model with multi-scale feature orthogonal projection is proposed, which includes a multi-scale orthogonal projection fusion network and an orthogonal feature ascending dimension network. Firstly, the multi-scale orthogonal projection fusion network performs orthogonal projection of features at multiple scales, using the residual structure to fuse features in the same plane separately, simplifying the feature learning difficulty and reducing the feature loss due to projection. Then, it is fed into the orthogonal feature ascending dimension network to reconstruct higher level 3D features using trilinear interpolation and deconvolution to improve the expressiveness of the model, and finally fed to the backbone network to supplement the information of the high-dimensional features, and the network regresses according to the different stages of the task to obtain the 3D human pose. The experimental results show that the Percentage of 3D Correct Parts is improved on the Campus and Shelf datasets, and the Mean Per Joint Position Error is reduced on the CMU Panoptic dataset and the average accuracy is improved at a smaller threshold compared to the previous method. The prediction results are also better than the previous method by reducing the perspective input on the trained model. The proposed method not only effectively estimates the 3D human pose, but also improves the prediction accuracy and enhances the robustness of the network model.

Publisher

EDP Sciences

Link

https://www.e3s-conferences.org/10.1051/e3sconf/202452201043/pdf

Reference21 articles.

1. Li R., Yang S., Ross D.A., et al. Learn to dance with aist++: Music conditioned 3d dance generation. arXiv preprint arXiv:2101.08779, 2021, 2(3).

2. Continuous body and hand gesture recognition for natural human-computer interaction

3. Qiu H., Wang C., Wang J., et al. Cross view fusion for 3d human pose estimation, Proceedings of the IEEE/CVF International Conference on Computer Vision. (2019) 4342–4351.

4. Kocabas M., Karagoz S., Akbas E., Self-supervised learning of 3d human pose using multi-view geometry, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2019) 1077–1086.