Synthetic Data Enhancement and Network Compression Technology of Monocular Depth Estimation for Real-Time Autonomous Driving System-Reference-Cited by-同舟云学术

Synthetic Data Enhancement and Network Compression Technology of Monocular Depth Estimation for Real-Time Autonomous Driving System

Published:2024-06-28 Issue:13 Volume:24 Page:4205
ISSN:1424-8220
Container-title:Sensors
language:en
Short-container-title:Sensors

Author:

Jun Woomin¹²,Yoo Jisang²³,Lee Sungjin¹²^ORCID

Affiliation:

1. Electronic Engineering, Dong Seoul University, Seongnam 13117, Republic of Korea

2. Autonomous Driving Lab, Modulabs, Seoul 06252, Republic of Korea

3. College of Electronics and Information, Kyung Hee University, 1732, Deogyeong-Daero, Giheung-gu, Yongin-si 17104, Republic of Korea

Abstract

Accurate 3D image recognition, critical for autonomous driving safety, is shifting from the LIDAR-based point cloud to camera-based depth estimation technologies driven by cost considerations and the point cloud’s limitations in detecting distant small objects. This research aims to enhance MDE (Monocular Depth Estimation) using a single camera, offering extreme cost-effectiveness in acquiring 3D environmental data. In particular, this paper focuses on novel data augmentation methods designed to enhance the accuracy of MDE. Our research addresses the challenge of limited MDE data quantities by proposing the use of synthetic-based augmentation techniques: Mask, Mask-Scale, and CutFlip. The implementation of these synthetic-based data augmentation strategies has demonstrably enhanced the accuracy of MDE models by 4.0% compared to the original dataset. Furthermore, this study introduces the RMS (Real-time Monocular Depth Estimation configuration considering Resolution, Efficiency, and Latency) algorithm, designed for the optimization of neural networks to augment the performance of contemporary monocular depth estimation technologies through a three-step process. Initially, it selects a model based on minimum latency and REL criteria, followed by refining the model’s accuracy using various data augmentation techniques and loss functions. Finally, the refined model is compressed using quantization and pruning techniques to minimize its size for efficient on-device real-time applications. Experimental results from implementing the RMS algorithm indicated that, within the required latency and size constraints, the IEBins model exhibited the most accurate REL (absolute RELative error) performance, achieving a 0.0480 REL. Furthermore, the data augmentation combination of the original dataset with Flip, Mask, and CutFlip, alongside the SigLoss loss function, displayed the best REL performance, with a score of 0.0461. The network compression technique using FP16 was analyzed as the most effective, reducing the model size by 83.4% compared to the original while maintaining the least impact on REL performance and latency. Finally, the performance of the RMS algorithm was validated on the on-device autonomous driving platform, NVIDIA Jetson AGX Orin, through which optimal deployment strategies were derived for various applications and scenarios requiring autonomous driving technologies.

Funder

National Research Foundation of Korea

Publisher

MDPI AG

Link

https://www.mdpi.com/1424-8220/24/13/4205/pdf

Reference83 articles.

1. A survey of deep learning techniques for autonomous driving;Grigorescu;J. Field Robot.,2020

2. Deep learning in robotics: Survey on model structures and training strategies;Galambos;IEEE Trans. Syst. Man Cybern. Syst.,2021

3. Probabilistic Multimodal Depth Estimation Based on Camera-LiDAR Sensor Fusion;Monteiro;Mach. Vis. Appl. J.,2023

4. Zhang, J., and Ding, Y. (2024). OccFusion: Depth Estimation Free Multi-sensor Fusion for 3D Occupancy Prediction. arXiv.

5. Multi-sensor data fusion based on the belief divergence measure of evidences and the belief entropy;Xiao;Inf. Fusion,2019

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Self-Supervised Few-Shot Semantic Segmentation Method Based on Multi-Task Learning and Dense Attention Computation;Sensors;2024-07-31