Multi-Task Foreground-Aware Network with Depth Completion for Enhanced RGB-D Fusion Object Detection Based on Transformer

Author:

Pan Jiasheng1,Zhong Songyi23,Yue Tao2ORCID,Yin Yankun3,Tang Yanhao3

Affiliation:

1. School of Computer Engineering and Science, Shanghai University, No. 99 Shangda Road, Shanghai 200444, China

2. School of Mechatronic Engineering and Automation, Shanghai University, No. 99 Shangda Road, Shanghai 200444, China

3. School of Artificial Intelligence, Shanghai University, No. 99 Shangda Road, Shanghai 200444, China

Abstract

Fusing multiple sensor perceptions, specifically LiDAR and camera, is a prevalent method for target recognition in autonomous driving systems. Traditional object detection algorithms are limited by the sparse nature of LiDAR point clouds, resulting in poor fusion performance, especially for detecting small and distant targets. In this paper, a multi-task parallel neural network based on the Transformer is constructed to simultaneously perform depth completion and object detection. The loss functions are redesigned to reduce environmental noise in depth completion, and a new fusion module is designed to enhance the network’s perception of the foreground and background. The network leverages the correlation between RGB pixels for depth completion, completing the LiDAR point cloud and addressing the mismatch between sparse LiDAR features and dense pixel features. Subsequently, we extract depth map features and effectively fuse them with RGB features, fully utilizing the depth feature differences between foreground and background to enhance object detection performance, especially for challenging targets. Compared to the baseline network, improvements of 4.78%, 8.93%, and 15.54% are achieved in the difficult indicators for cars, pedestrians, and cyclists, respectively. Experimental results also demonstrate that the network achieves a speed of 38 fps, validating the efficiency and feasibility of the proposed method.

Funder

National Natural Science Foundation of China

Shanghai Science and Technology Committee Natural Science Program

Publisher

MDPI AG

Reference63 articles.

1. Liu, Z., Tang, H., Amini, A., Yang, X., Mao, H., Rus, D.L., and Han, S. (June, January 29). Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation. Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK.

2. Stereo matching algorithm based on deep learning: A survey;Hamid;J. King Saud-Univ.-Comput. Inf. Sci.,2022

3. Unsupervised object class discovery via saliency-guided multiple class learning;Zhu;IEEE Trans. Pattern Anal. Mach. Intell.,2014

4. Re-thinking co-salient object detection;Fan;IEEE Trans. Pattern Anal. Mach. Intell.,2021

5. Human action recognition from various data modalities: A review;Sun;IEEE Trans. Pattern Anal. Mach. Intell.,2022

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3