DASANet: A 3D Object Detector with Density-and-Sparsity Feature Aggregation
-
Published:2023-09-18
Issue:18
Volume:15
Page:4587
-
ISSN:2072-4292
-
Container-title:Remote Sensing
-
language:en
-
Short-container-title:Remote Sensing
Author:
Zhang Qiang1ORCID, Wei Dongdong2
Affiliation:
1. Remote Sensing Image Processing and Fusion Group, School of Electronic Engineering, Xidian University, Xi’an 710071, China 2. Hangzhou Institute of Technology, Xidian University, Hangzhou 311200, China
Abstract
In the field of autonomous driving and robotics, 3D object detection is a difficult, but important task. To improve the accuracy of detection, LiDAR, which collects the 3D point cloud of a scene, is updated constantly. But the density of the collected 3D points is low, and its distribution is unbalanced in the scene, which influences the accuracy of 3D object detectors in regards to object location and identification. Although corresponding high-resolution scene images from cameras can be used as supplemental information, poor fusion strategies can result in decreased accuracy compared with that of LiDAR-point-only detectors. Thus, to improve the detection performance for the classification, localization, and even boundary location of 3D objects, a two-stage detector with density-and-sparsity feature aggregation, called DASANet, is proposed in this paper. In the first stage, dense pseudo point clouds are generated with images from cameras and are used to obtain the initial proposals. In the second stage, two novel feature aggregation modules are designed to fuse LiDAR point information and pseudo point information, which refines the semantic and detailed representation of the feature maps. To supplement the semantic information of the highest-scale LiDAR features for object localization and classification, a triple differential information supplement (TDIS) module is presented to extract the LiDAR-pseudo differential features and enhance them in spatial, channel, and global dimensions. To increase the detailed information of the LiDAR features for object boundary location, a Siamese three-dimension coordinate attention (STCA) module is presented to extract stable LiDAR and pseudo point cloud features with a Siamese encoder and fuse these features using a three-dimension coordinate attention. Experiments using the KITTI Vision Benchmark Suite demonstrate the improved performance of our DASANet in regards to the localization and boundary location of objects. The ablation studies demonstrate the effectiveness of the TDIS and the STCA modules.
Funder
National Natural Science Foundation of China
Subject
General Earth and Planetary Sciences
Reference57 articles.
1. Mousavian, A., Anguelov, D., Flynn, J., and Kosecka, J. (2017, January 21–26). 3D bounding box estimation using deep learning and geometry. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA. 2. Li, B.Y., Ouyang, W.L., Sheng, L., Zeng, X.Y., and Wang, X.G. (2019, January 16–20). GS3D: An Efficient 3D Object Detection Framework for Autonomous Driving. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA. 3. Liu, Z.C., Wu, Z.Z., and Toth, R. (2020, January 14–19). SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Washington, DC, USA. 4. Wang, T., Zhu, X.G., Pang, J.M., and Lin, D.H. (2021, January 11–17). FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada. 5. Wang, T., Xinge, Z., Pang, J., and Lin, D. (2022, January 14–18). Probabilistic and geometric depth: Detecting objects in perspective. Proceedings of the Conference on Robot Learning, Auckland, New Zealand.
|
|