Learnable fusion mechanisms for multimodal object detection in autonomous vehicles-Reference-Cited by-同舟云学术

Learnable fusion mechanisms for multimodal object detection in autonomous vehicles

Published:2024-03-15 Issue:4 Volume:18 Page:499-511
ISSN:1751-9632
Container-title:IET Computer Vision
language:en
Short-container-title:IET Computer Vision

Author:

Massoud Yahya¹^ORCID,Laganiere Robert¹

Affiliation:

1. School of Electrical Engineering and Computer Science University of Ottawa Ottawa Ontario Canada

Abstract

AbstractPerception systems in autonomous vehicles need to accurately detect and classify objects within their surrounding environments. Numerous types of sensors are deployed on these vehicles, and the combination of such multimodal data streams can significantly boost performance. The authors introduce a novel sensor fusion framework using deep convolutional neural networks. The framework employs both camera and LiDAR sensors in a multimodal, multiview configuration. The authors leverage both data types by introducing two new innovative fusion mechanisms: element‐wise multiplication and multimodal factorised bilinear pooling. The methods improve the bird's eye view moderate average precision score by +4.97% and +8.35% on the KITTI dataset when compared to traditional fusion operators like element‐wise addition and feature map concatenation. An in‐depth analysis of key design choices impacting performance, such as data augmentation, multi‐task learning, and convolutional architecture design is offered. The study aims to pave the way for the development of more robust multimodal machine vision systems. The authors conclude the paper with qualitative results, discussing both successful and problematic cases, along with potential ways to mitigate the latter.

Publisher

Institution of Engineering and Technology (IET)

Reference58 articles.

1. A ConvNet for the 2020s

2. CenterNet: Keypoint Triplets for Object Detection