Author:
Massoud Yahya,Laganiere Robert
Abstract
<p> In this work, we propose a novel deep learning based sensor fusion framework, that uses both camera and LiDAR sensors in a multi-modal and multi-view setting. In order to leverage both data streams, we incorporate two new sophisticated fusion mechanisms: element-wise multiplication and multi-modal factorized bilinear pooling. When compared to previously used fusion operators such as element-wise addition and concatenation of feature maps, our proposed fusion methods significantly increase the bird’s eye view moderate average precision score by +4.97% and +8.35% for both methods, respectively, when evaluated on KITTI dataset for object detection. Furthermore, we provide a detailed study of important design choices that contribute to the performance of deep learning based sensor fusion frameworks such as data augmentation, multi-task learning, and the design of the convolutional architecture. Finally, we provide qualitative results that showcase both success and failure cases for our proposed framework. We also discuss directions for mitigating failure cases. </p>
Publisher
Institute of Electrical and Electronics Engineers (IEEE)
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献