LidarMultiNet: Towards a Unified Multi-Task Network for LiDAR Perception-Reference-Cited by-同舟云学术

LidarMultiNet: Towards a Unified Multi-Task Network for LiDAR Perception

Published:2023-06-26 Issue:3 Volume:37 Page:3231-3240
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Ye Dongqiangzi,Zhou Zixiang,Chen Weijia,Xie Yufei,Wang Yu,Wang Panqu,Foroosh Hassan

Abstract

LiDAR-based 3D object detection, semantic segmentation, and panoptic segmentation are usually implemented in specialized networks with distinctive architectures that are difficult to adapt to each other. This paper presents LidarMultiNet, a LiDAR-based multi-task network that unifies these three major LiDAR perception tasks. Among its many benefits, a multi-task network can reduce the overall cost by sharing weights and computation among multiple tasks. However, it typically underperforms compared to independently combined single-task models. The proposed LidarMultiNet aims to bridge the performance gap between the multi-task network and multiple single-task networks. At the core of LidarMultiNet is a strong 3D voxel-based encoder-decoder architecture with a Global Context Pooling (GCP) module extracting global contextual features from a LiDAR frame. Task-specific heads are added on top of the network to perform the three LiDAR perception tasks. More tasks can be implemented simply by adding new task-specific heads while introducing little additional cost. A second stage is also proposed to refine the first-stage segmentation and generate accurate panoptic segmentation results. LidarMultiNet is extensively tested on both Waymo Open Dataset and nuScenes dataset, demonstrating for the first time that major LiDAR perception tasks can be unified in a single strong network that is trained end-to-end and achieves state-of-the-art performance. Notably, LidarMultiNet reaches the official 1 place in the Waymo Open Dataset 3D semantic segmentation challenge 2022 with the highest mIoU and the best accuracy for most of the 22 classes on the test set, using only LiDAR points as input. It also sets the new state-of-the-art for a single model on the Waymo 3D object detection benchmark and three nuScenes benchmarks.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 16 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Joint Semantic Segmentation using representations of LiDAR point clouds and camera images;Information Fusion;2024-08

2. A comprehensive overview of deep learning techniques for 3D point cloud classification and semantic segmentation;Machine Vision and Applications;2024-05-18

3. LPFormer: LiDAR Pose Estimation Transformer with Multi-Task Network;2024 IEEE International Conference on Robotics and Automation (ICRA);2024-05-13

4. AYDIV: Adaptable Yielding 3D Object Detection via Integrated Contextual Vision Transformer;2024 IEEE International Conference on Robotics and Automation (ICRA);2024-05-13

5. Multi-Granular Transformer for Motion Prediction with LiDAR;2024 IEEE International Conference on Robotics and Automation (ICRA);2024-05-13