Author:
Li Shichao,Huang Xijie,Liu Zechun,Cheng Kwang-Ting
Abstract
AbstractWe present a new learning-based framework S-3D-RCNN that can recover accurate object orientation in SO(3) and simultaneously predict implicit rigid shapes from stereo RGB images. For orientation estimation, in contrast to previous studies that map local appearance to observation angles, we propose a progressive approach by extracting meaningful Intermediate Geometrical Representations (IGRs). This approach features a deep model that transforms perceived intensities from one or two views to object part coordinates to achieve direct egocentric object orientation estimation in the camera coordinate system. To further achieve finer description inside 3D bounding boxes, we investigate the implicit shape estimation problem from stereo images. We model visible object surfaces by designing a point-based representation, augmenting IGRs to explicitly address the unseen surface hallucination problem. Extensive experiments validate the effectiveness of the proposed IGRs, and S-3D-RCNN achieves superior 3D scene understanding performance. We also designed new metrics on the KITTI benchmark for our evaluation of implicit shape estimation.
Funder
Hong Kong Research Grants Council (RGC) General Research Fund
Publisher
Springer Science and Business Media LLC
Reference92 articles.
1. Marr, D. Vision: A Computational Investigation into the Human Representation and Processing of Visual Information, 32–33 (MIT press, 2010).
2. Ferryman, J. M., Maybank, S. J. & Worrall, A. D. Visual surveillance for moving vehicles. Int. J. Comput. Vision 37, 187–197 (2000).
3. Yang, B., Bai, M., Liang, M., Zeng, W. & Urtasun, R. Auto4d: Learning to label 4d objects from sequential point clouds. arXiv:2101.06586 (2021).
4. Mousavian, A., Anguelov, D., Flynn, J. & Kosecka, J. 3d bounding box estimation using deep learning and geometry. CVPR 2017, 7074–7082 (2017).
5. Brazil, G. & Liu, X. M3d-rpn: Monocular 3d region proposal network for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision 9287–9296 (2019).
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献