DON6D: a decoupled one-stage network for 6D pose estimation-Reference-Cited by-同舟云学术

DON6D: a decoupled one-stage network for 6D pose estimation

Published:2024-04-10 Issue:1 Volume:14 Page:
ISSN:2045-2322
Container-title:Scientific Reports
language:en
Short-container-title:Sci Rep

Author:

Wang Zheng,Tu Hangyao,Qian Yutong,Zhao Yanwei

Abstract

AbstractThe six-dimensional (6D) pose object estimation is a key task in robotic manipulation and grasping scenes. Many existing two-stage solutions with a slow inference speed require extra refinement to handle the challenges of variations in lighting, sensor noise, object occlusion, and truncation. To address these challenges, this work proposes a decoupled one-stage network (DON6D) model for 6D pose estimation that improves inference speed on the premise of maintaining accuracy. Particularly, since the RGB images are aligned with the RGB-D images, the proposed DON6D first uses a two-dimensional detection network to locate the interested objects in RGB-D images. Then, a module of feature extraction and fusion is used to extract color and geometric features fully. Further, dual data augmentation is performed to enhance the generalization ability of the proposed model. Finally, the features are fused, and an attention residual encoder–decoder, which can improve the pose estimation performance to obtain an accurate 6D pose, is introduced. The proposed DON6D model is evaluated on the LINEMOD and YCB-Video datasets. The results demonstrate that the proposed DON6D is superior to several state-of-the-art methods regarding the ADD(-S) and ADD(-S) AUC metrics.

Funder

the Key Research and Development Program of Zhejiang Province

Research incubation Foundation of Hangzhou City University

National Natural Science Foundation of China

Publisher

Springer Science and Business Media LLC

Link

https://www.nature.com/articles/s41598-024-59152-x.pdf

Reference45 articles.

1. Xie, S. Research on the industrial robot grasping method based on multisensor data fusion and binocular vision. Comput. Intell. Neurosci. 2022, 4443100 (2022).

2. Wen, B., Mitash, C. & Soorian, S. et al. Robust, Occlusion-aware Pose Estimation for Objects Grasped by Adaptive Hands. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 6210–6217, (2020).

3. Xu, D., Anguelov, D. & Jain, A. PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 244–253, (2018)

4. Hinterstoisser, S., Holzer, S. & Cagniart, C. et al. Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes. In 2011 International Conference on Computer Vision (ICCV), pp. 858–865, (2011).

5. Wang, C., Xu, D. & Zhu, Y. et al. DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion. In 2019 Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3344–3352, (2019).