Affiliation:
1. School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore
2. Computer Vision and Robot Research Center, International Digital Economy Academy, Shenzhen, Guangdong, P. R. China
Abstract
Road segmentation is essential to unmanned systems, contributing to road perception and navigation in the field of autonomous driving. While multi-modal road segmentation methods have shown promising results by leveraging the complementary data of RGB and Depth to provide robust 3D geometry information, existing methods suffer from severe efficiency problems that hinder their practical application in autonomous driving. Their direct concatenation of multi-modal features with a densely-connected network leads to increased semantic gaps among modalities and scales, causing high computational and time complexity. To address these issues, we propose a Multi-modal Scale-aware Attention Network (MSAN) to fuse RGB and Depth data effectively via a novel transformer-based cross-attention module, namely Multi-modal Scare-aware Transformer (MST), which fuses RGB-D features from a global perspective across multiple scales. To better consolidate different scales of features, we further propose a Scale-aware Attention Module (SAM) that captures channel-wise attention efficiently for cross-scale fusion. These two attention-based modules explore the complementarity of modalities and scales, narrowing the gaps and avoiding complex structures for road segmentation. Extensive experiments demonstrate MSAN achieves competitive performance at a low computational cost, suitable for real-time implementation on edge-devices in autonomous driving systems.
Publisher
World Scientific Pub Co Pte Ltd
Subject
Control and Optimization,Aerospace Engineering,Automotive Engineering,Control and Systems Engineering
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献