Efficient Object Detection Using Semantic Region of Interest Generation with Light-Weighted LiDAR Clustering in Embedded Processors
Author:
Jung Dongkyu1, Chong Taewon23, Park Daejin1ORCID
Affiliation:
1. School of Electronic and Electrical Engineering, Kyungpook National University, Daegu 41566, Republic of Korea 2. Carnavicom Co., Ltd., Incheon 21984, Republic of Korea 3. Department of Physics, Hanyang University, Seoul 04763, Republic of Korea
Abstract
Many fields are currently investigating the use of convolutional neural networks to detect specific objects in three-dimensional data. While algorithms based on three-dimensional data are more stable and insensitive to lighting conditions than algorithms based on two-dimensional image data, they require more computation than two-dimensional data, making it difficult to drive CNN algorithms using three-dimensional data in lightweight embedded systems. In this paper, we propose a method to process three-dimensional data through a simple algorithm instead of complex operations such as convolution in CNN, and utilize its physical characteristics to generate ROIs to perform a CNN object detection algorithm based on two-dimensional image data. After preprocessing the LiDAR point cloud data, it is separated into individual objects through clustering, and semantic detection is performed through a classifier trained based on machine learning by extracting physical characteristics that can be utilized for semantic detection. The final object recognition is performed through a 2D-based object detection algorithm that bypasses the process of tracking bounding boxes by generating individual 2D image regions from the location and size of objects initially detected by semantic detection. This allows us to utilize the physical characteristics of 3D data to improve the accuracy of 2D image-based object detection algorithms, even in environments where it is difficult to collect data from camera sensors, resulting in a lighter system than 3D data-based object detection algorithms. The proposed model achieved an accuracy of 81.84% on the YOLO v5 algorithm on an embedded board, which is 1.92% higher than the typical model. The proposed model achieves 47.41% accuracy in an environment with 40% higher brightness and 54.12% accuracy in an environment with 40% lower brightness, which is 8.97% and 13.58% higher than the general model, respectively, and can achieve high accuracy even in non-optimal brightness environments. The proposed technique also has the advantage of reducing the execution time depending on the operating environment of the detection model.
Funder
National Research Foundation of Korea (NRF) funded by the Ministry of Education
Subject
Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry
Reference25 articles.
1. Lidar for Autonomous Driving: The Principles, Challenges, and Trends for Automotive Lidar and Perception Systems;Li;IEEE Signal Process. Mag.,2020 2. A review of convolutional-neural-network-based action recognition;Yao;Pattern Recognit. Lett.,2019 3. Fridedrich, J., and Zschech, P. (2009, January 25–27). Chip Hardware-in-the-Loop simulation coupling optimization through new algorithm analysis technique. Proceedings of the 15th International Conference on Wirtschaftsinformatik (WI 2020), Lodz, Poland. 4. Li, X., Ma, T., Hou, Y., Shi, B., Yang, Y., Liu, Y., Wu, X., Chen, Q., Li, Y., and Qiao, Y. (2023, January 14–19). LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross-Modal Fusion. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA. 5. Wu, H., Wen, C., Shi, S., Li, X., and Wang, C. (2023, January 14–19). Virtual Sparse Convolution for Multimodal 3D Object Detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.
|
|