Abstract
We present a scheme of how YOLO can be improved in order to predict the absolute distance of objects using only information from a monocular camera. It is fully integrated into the original architecture by extending the prediction vectors, sharing the backbone’s weights with the bounding box regressor, and updating the original loss function by a part responsible for distance estimation. We designed two ways of handling the distance, class-agnostic and class-aware, proving class-agnostic creates smaller prediction vectors than class-aware and achieves better results. We demonstrate that the subtasks of object detection and distance measurement are in synergy, resulting in the increase of the precision of the original bounding box functionality. We show that using the KITTI dataset, the proposed scheme yields a mean relative error of 11% considering all eight classes and the distance range within [0, 150] m, which makes the solution highly competitive with existing approaches. Finally, we show that the inference speed is identical to the unmodified YOLO, 45 frames per second.
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference53 articles.
1. Estimation of Absolute Scale in Monocular SLAM Using Synthetic Data
2. Depth Sensing Beyond LiDAR Range
3. Yolov3: An incremental improvement;Redmon;arXiv,2018
4. Yolov4: Optimal speed and accuracy of object detection;Bochkovskiy;arXiv,2020
Cited by
44 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献