Abstract
AbstractSensor fusion is an important component of the perception system in autonomous driving, and the fusion of radar point cloud information and camera visual information can improve the perception capability of autonomous vehicles. However, most of the existing studies ignore the extraction of local neighborhood information and only consider shallow fusion between the two modalities based on the extracted global information, which cannot perform a deep fusion of cross-modal contextual information interaction. Meanwhile, in data preprocessing, the noise in radar data is usually only filtered by the depth information derived from image feature prediction, and such methods affect the accuracy of radar branching to generate regions of interest and cannot effectively filter out irrelevant information of radar points. This paper proposes the CenterTransFuser model that makes full use of millimeter-wave radar point cloud information and visual information to enable cross-modal fusion of the two heterogeneous information. Specifically, a new interaction called cross-transformer is explored, which cooperatively exploits cross-modal cross-multiple attention and joint cross-multiple attention to mine radar and image complementary information. Meanwhile, an adaptive depth thresholding filtering method is designed to reduce the noise of radar modality-independent information projected onto the image. The CenterTransFuser model is evaluated on the challenging nuScenes dataset, and it achieves excellent performance. Particularly, the detection accuracy is significantly improved for pedestrians, motorcycles, and bicycles, showing the superiority and effectiveness of the proposed model.
Funder
National Natural Science Foundation of China
Publisher
Springer Science and Business Media LLC
Reference36 articles.
1. K. Ren, Q. Wang, C. Wang, Z. Qin, X. Lin, The security of autonomous driving: threats, defenses, and future directions. Proc. IEEE 108(2), 357–372 (2019)
2. X. Chen, H. Ma, J. Wan, B. Li, T. Xia, Multi-view 3d object detection network for autonomous driving, in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (2017), pp. 1907–1915
3. V. John, N.M. Karunakaran, C. Guo, K. Kidono, S. Mita, Free space, visible and missing lane marker estimation using the PsiNet and extra trees regression, in 2018 24th International Conference on Pattern Recognition (ICPR). (IEEE, 2018), pp. 189–194
4. P. Li, X. Chen, S. Shen, Stereo r-cnn based 3d object detection for autonomous driving, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019), pp. 7644–7652
5. C. Michaelis, B. Mitzkus, R. Geirhos, E. Rusak, O. Bringmann, A.S. Ecker, M. Bethge, W. Brendel, Benchmarking robustness in object detection: autonomous driving when winter is coming (2019). arXiv preprint http://arxiv.org/abs/1907.07484
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献