Category-Level Object Pose Estimation with Statistic Attention
Author:
Jiang Changhong1ORCID, Mu Xiaoqiao2, Zhang Bingbing3ORCID, Liang Chao4, Xie Mujun1ORCID
Affiliation:
1. School of Electrical and Electronic Engineering, Changchun University of Technology, Changchun 130012, China 2. School of Mechanical and Electrical Engineering, Changchun University of Technology, Changchun 130012, China 3. School of Computer Science and Engineering, Dalian Minzu University, Dalian 116602, China 4. Collage of Computer Science and Engineering, Changchun University of Technology, Changchun 130012, China
Abstract
Six-dimensional object pose estimation is a fundamental problem in the field of computer vision. Recently, category-level object pose estimation methods based on 3D-GC have made significant breakthroughs due to advancements in 3D-GC. However, current methods often fail to capture long-range dependencies, which are crucial for modeling complex and occluded object shapes. Additionally, discerning detailed differences between different objects is essential. Some existing methods utilize self-attention mechanisms or Transformer encoder–decoder structures to address the lack of long-range dependencies, but they only focus on first-order information of features, failing to explore more complex information and neglecting detailed differences between objects. In this paper, we propose SAPENet, which follows the 3D-GC architecture but replaces the 3D-GC in the encoder part with HS-layer to extract features and incorporates statistical attention to compute higher-order statistical information. Additionally, three sub-modules are designed for pose regression, point cloud reconstruction, and bounding box voting. The pose regression module also integrates statistical attention to leverage higher-order statistical information for modeling geometric relationships and aiding regression. Experiments demonstrate that our method achieves outstanding performance, attaining an mAP of 49.5 on the 5°2 cm metric, which is 3.4 higher than the baseline model. Our method achieves state-of-the-art (SOTA) performance on the REAL275 dataset.
Funder
Science and Technology Development Program Project of Jilin Province
Reference40 articles.
1. Wen, B., and Bekris, K. (October, January 27). Bundletrack: 6D Pose Tracking for Novel Objects without Instance or Category-Level 3D Models. Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic. 2. Burdea, G., and Coiffet, P. (2003). Virtual Reality Technology, John Wiley & Sons. 3. Nie, Y., Han, X., Guo, S., Zheng, Y., Chang, J., and Zhang, J. (2020, January 13–19). Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA. 4. Zhang, C., Cui, Z., Zhang, Y., Zeng, B., Pollefeys, M., and Liu, S. (2021, January 20–25). Holistic 3D Scene Understanding from a Single Image with Implicit Representation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA. 5. Kothari, N., Gupta, M., Vachhani, L., and Arya, H. (2017, January 4–6). Pose Estimation for an Autonomous Vehicle Using Monocular Vision. Proceedings of the 2017 Indian Control Conference (ICC), Guwahati, India.
|
|