YOLOv7-3D: A Monocular 3D Traffic Object Detection Method from a Roadside Perspective
-
Published:2023-10-17
Issue:20
Volume:13
Page:11402
-
ISSN:2076-3417
-
Container-title:Applied Sciences
-
language:en
-
Short-container-title:Applied Sciences
Author:
Ye Zixun12ORCID, Zhang Hongying1, Gu Jingliang2, Li Xue1
Affiliation:
1. School of Information Engineering, Southwest University of Science and Technology, Mianyang 621010, China 2. Institute of Applied Electronics, CAEP, Mianyang 621900, China
Abstract
Current autonomous driving systems predominantly focus on 3D object perception from the vehicle’s perspective. However, the single-camera 3D object detection algorithm in the roadside monitoring scenario provides stereo perception of traffic objects, offering more accurate collection and analysis of traffic information to ensure reliable support for urban traffic safety. In this paper, we propose the YOLOv7-3D algorithm specifically designed for single-camera 3D object detection from a roadside viewpoint. Our approach utilizes various information, including 2D bounding boxes, projected corner keypoints, and offset vectors relative to the center of the 2D bounding boxes, to enhance the accuracy of 3D object bounding box detection. Additionally, we introduce a 5-layer feature pyramid network (FPN) structure and a multi-scale spatial attention mechanism to improve feature saliency for objects of different scales, thereby enhancing the detection accuracy of the network. Experimental results demonstrate that our YOLOv7-3D network achieved significantly higher detection accuracy on the Rope3D dataset while reducing computational complexity by 60%.
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference41 articles.
1. Cui, J., Qiu, H., Chen, D., Stone, P., and Zhu, Y. (2022, January 19–24). Coopernaut: End-to-end driving with cooperative perception for networked vehicles. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA. 2. Huang, J., Huang, G., Zhu, Z., Ye, Y., and Du, D. (2021). Bevdet: High-performance multi-camera 3d object detection in bird-eye-view. arXiv. 3. Yu, H., Luo, Y., Shu, M., Huo, Y., Yang, Z., Shi, Y., Guo, Z., Li, H., Hu, X., and Yuan, J. (2022, January 19–24). Dair-v2x: A large-scale dataset for vehicle-infrastructure cooperative 3d object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA. 4. Xu, R., Xiang, H., Tu, Z., Xia, X., Yang, M.H., and Ma, J. (2022). European Conference on Computer Vision, Springer. 5. Deep Visual Re-identification with Confidence;Adaimi;Transp. Res. Part C Emerg. Technol.,2021
|
|