Custom Anchorless Object Detection Model for 3D Synthetic Traffic Sign Board Dataset with Depth Estimation and Text Character Extraction-Reference-Cited by-同舟云学术

Custom Anchorless Object Detection Model for 3D Synthetic Traffic Sign Board Dataset with Depth Estimation and Text Character Extraction

Published:2024-07-21 Issue:14 Volume:14 Page:6352
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Soans Rahul¹^ORCID,Fukumizu Yohei¹^ORCID

Affiliation:

1. Graduate School of Science and Engineering, Ritsumeikan University, Kusatsu 525-8577, Japan

Abstract

This paper introduces an anchorless deep learning model designed for efficient analysis and processing of large-scale 3D synthetic traffic sign board datasets. With an ever-increasing emphasis on autonomous driving systems and their reliance on precise environmental perception, the ability to accurately interpret traffic sign information is crucial. Our model seamlessly integrates object detection, depth estimation, deformable parts, and text character extraction functionalities, facilitating a comprehensive understanding of road signs in simulated environments that mimic the real world. The dataset used has a large number of artificially generated traffic signs for 183 different classes. The signs include place names in Japanese and English, expressway names in Japanese and English, distances and motorway numbers, and direction arrow marks with different lighting, occlusion, viewing angles, camera distortion, day and night cycles, and bad weather like rain, snow, and fog. This was done so that the model could be tested thoroughly in a wide range of difficult conditions. We developed a convolutional neural network with a modified lightweight hourglass backbone using depthwise spatial and pointwise convolutions, along with spatial and channel attention modules that produce resilient feature maps. We conducted experiments to benchmark our model against the baseline model, showing improved accuracy and efficiency in both depth estimation and text extraction tasks, crucial for real-time applications in autonomous navigation systems. With its model efficiency and partwise decoded predictions, along with Optical Character Recognition (OCR), our approach suggests its potential as a valuable tool for developers of Advanced Driver-Assistance Systems (ADAS), Autonomous Vehicle (AV) technologies, and transportation safety applications, ensuring reliable navigation solutions.

Publisher

MDPI AG

Link

https://www.mdpi.com/2076-3417/14/14/6352/pdf

Reference60 articles.

1. BlenderProc2: A Procedural Pipeline for Photorealistic Rendering;Denninger;J. Open Source Softw.,2023

2. Community, B.O. (2022, April 15). Blender—A 3D Modelling and Rendering Package. Stichting Blender Foundation, Amsterdam: Blender Foundation. Available online: https://www.blender.org.

3. Haas, J.K. (2014). A History of the Unity Game Engine. [Ph.D. Thesis, Worcester Polytechnic Institute]. Available online: https://www.unity.com.

4. Leibe, B., Matas, J., Sebe, N., and Welling, M. (2016). Playing for Data: Ground Truth from Computer Games. Computer Vision—ECCV 2016, Proceedings of the 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016, Springer International Publishing. Lecture Notes in Computer Science.

5. Tremblay, J., Prakash, A., Acuna, D., Brophy, M., Jampani, V., Anil, C., To, T., Cameracci, E., Boochoon, S., and Birchfield, S. (2018, January 18–22). Training Deep Networks with Synthetic Data: Bridging the Reality Gap by Domain Randomization. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA.