High‐order multilayer attention fusion network for <scp>3D</scp> object detection-Reference-Cited by-同舟云学术

High‐order multilayer attention fusion network for 3D object detection

Published:2024-09-08 Issue: Volume: Page:
ISSN:2577-8196
Container-title:Engineering Reports
language:en
Short-container-title:Engineering Reports

Author:

Zhang Baowen¹^ORCID,Zhao Yongyong¹,Su Chengzhi¹,Cao Guohua¹

Affiliation:

1. School of Mechanical and Electrical Engineering Changchun University of Science and Technology Changchun China

Abstract

AbstractThree‐dimensional object detection based on the fusion of 2D image data and 3D point clouds has become a research hotspot in the field of 3D scene understanding. However, different sensor data have discrepancies in spatial position, scale, and alignment, which severely impact detection performance. Inappropriate fusion methods can lead to the loss and interference of valuable information. Therefore, we propose the High‐Order Multi‐Level Attention Fusion Network (HMAF‐Net), which takes camera images and voxelized point clouds as inputs for 3D object detection. To enhance the expressive power between different modality features, we introduce a high‐order feature fusion module that performs multi‐level convolution operations on the element‐wise summed features. By incorporating filtering and non‐linear activation, we extract deep semantic information from the fused multi‐modal features. To maximize the effectiveness of the fused salient feature information, we introduce an attention mechanism that dynamically evaluates the importance of pooled features at each level, enabling adaptive weighted fusion of significant and secondary features. To validate the effectiveness of HMAF‐Net, we conduct experiments on the KITTI dataset. In the “Car,” “Pedestrian,” and “Cyclist” categories, HMAF‐Net achieves mAP performances of 81.78%, 60.09%, and 63.91%, respectively, demonstrating more stable performance compared to other multi‐modal methods. Furthermore, we further evaluate the framework's effectiveness and generalization capability through the KITTI benchmark test, and compare its performance with other published detection methods on the 3D detection benchmark and BEV detection benchmark for the “Car” category, showing excellent results. The code and model will be made available on https://github.com/baowenzhang/High‐order‐Multilayer‐Attention‐Fusion‐Network‐for‐3D‐Object‐Detection.

Publisher

Wiley

Link

https://onlinelibrary.wiley.com/doi/pdf/10.1002/eng2.12987

Reference49 articles.

1. Shalev‐ShwartzS ShammahS ShashuaA.On a formal model of safe and scalable self‐driving cars. Proceedings of the Artificial Intelligence arXiv:1708.06374.2018.

2. WangY ShiT YunP TaiL LiuM.PointSeg: real‐time semantic segmentation based on 3D LiDAR point cloud. Proceedings of the Computer Vision and Pattern Recognition arXiv:1807.06288.2018.

3. JiangY JavanmardiE TsukadaM EsakiH.Accurate cooperative localization utilizing LiDAR‐equipped roadside infrastructure for autonomous driving. Proceedings of the IEEE Intelligent Transportation Systems Conference (ITSC) arXiv:2407.08384.2024.

4. GamerdingerJ TeufelS AmannS VolkG BringmannO.LSM: a comprehensive metric for assessing the safety of lane detection systems in autonomous driving. Proceedings of the Computer Vision and Pattern Recognition(csCV) arXiv:2407.07740.2024.

5. HongD ZhangB LiX et al.SpectralGPT: spectral remote sensing foundation model. Proceedings of the Computer Vision and Pattern Recognition(csCV) arXiv:2311.07113.2023.