PolarFormer: Multi-Camera 3D Object Detection with Polar Transformer-Reference-Cited by-同舟云学术

PolarFormer: Multi-Camera 3D Object Detection with Polar Transformer

Published:2023-06-26 Issue:1 Volume:37 Page:1042-1050
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Jiang Yanqin,Zhang Li,Miao Zhenwei,Zhu Xiatian,Gao Jin,Hu Weiming,Jiang Yu-Gang

Abstract

3D object detection in autonomous driving aims to reason “what” and “where” the objects of interest present in a 3D world. Following the conventional wisdom of previous 2D object detection, existing methods often adopt the canonical Cartesian coordinate system with perpendicular axis. However, we conjugate that this does not fit the nature of the ego car’s perspective, as each onboard camera perceives the world in shape of wedge intrinsic to the imaging geometry with radical (non perpendicular) axis. Hence, in this paper we advocate the exploitation of the Polar coordinate system and propose a new Polar Transformer (PolarFormer) for more accurate 3D object detection in the bird’s-eye-view (BEV) taking as input only multi-camera 2D images. Specifically, we design a cross-attention based Polar detection head without restriction to the shape of input structure to deal with irregular Polar grids. For tackling the unconstrained object scale variations along Polar’s distance dimension, we further introduce a multi-scale Polar representation learning strategy. As a result, our model can make best use of the Polar representation rasterized via attending to the corresponding image observation in a sequence-to-sequence fashion subject to the geometric constraints. Thorough experiments on the nuScenes dataset demonstrate that our PolarFormer outperforms significantly state-of-the-art 3D object detection alternatives.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 43 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A survey on occupancy perception for autonomous driving: The information fusion perspective;Information Fusion;2025-02

2. Exploring Point-BEV Fusion for 3D Point Cloud Object Tracking With Transformer;IEEE Transactions on Pattern Analysis and Machine Intelligence;2024-09

3. Monocular BEV Perception of Road Scenes via Front-to-Top View Projection;IEEE Transactions on Pattern Analysis and Machine Intelligence;2024-09

4. Cyclic Refiner: Object-Aware Temporal Representation Learning for Multi-view 3D Detection and Tracking;International Journal of Computer Vision;2024-07-16

5. Robust BEV 3D Object Detection for Vehicles with Tire Blow-Out;Sensors;2024-07-09