Affiliation:
1. College of Automation, Guangxi University of Science and Technology, Liuzhou 545000, China
2. Department of Artificial Intelligence and Manufacturing, Hechi University, Hechi 547000, China
Abstract
Road instance segmentation is vital for autonomous driving, yet the current algorithms struggle in complex city environments, with issues like poor small object segmentation, low-quality mask edge contours, slow processing, and limited model adaptability. This paper introduces an enhanced instance segmentation method based on SOLOv2. It integrates the Bottleneck Transformer (BoT) module into VoVNetV2, replacing the standard convolutions with ghost convolutions. Additionally, it replaces ResNet with an improved VoVNetV2 backbone to enhance the feature extraction and segmentation speed. Furthermore, the algorithm employs Feature Pyramid Grids (FPGs) instead of Feature Pyramid Networks (FPNs) to introduce multi-directional lateral connections for better feature fusion. Lastly, it incorporates a convolutional Block Attention Module (CBAM) into the detection head for refined features by considering the attention weight coefficients in both the channel and spatial dimensions. The experimental results demonstrate the algorithm’s effectiveness, achieving a 27.6% mAP on Cityscapes, a 4.2% improvement over SOLOv2. It also attains a segmentation speed of 8.9 FPS, a 1.7 FPS increase over SOLOv2, confirming its practicality for real-world engineering applications.
Funder
National Natural Science Foundation of China
Natural Science Foundation of Guangxi Province
Innovation Fund of Chinese Universities Industry-University-Research
Young and Middle-aged Teachers in Guangxi Universities
Special research project of Hechi University
project of outstanding thousand young teachers’ training in higher education institutions of Guangxi
Guangxi Colleges and Universities Key Laboratory of AI and Information Processing
Education Department of Guangxi Zhuang Autonomous Region
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering
Reference28 articles.
1. A threshold selection method from gray-level histograms;Otsu;IEEE Trans. Syst. Man Cybern.,1979
2. Long, J., Shelhamer, E., and Darrell, T. (2015, January 7–12). Fully convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.
3. A survey on deep learning techniques for image and video semantic segmentation;Oprea;Appl. Soft Comput.,2018
4. Solov2: Dynamic and fast instance segmentation;Wang;Adv. Neural Inf. Process. Syst.,2020
5. Srinivas, A., Lin, T.Y., Parmar, N., Shlens, J., Abbeel, P., and Vaswani, A. (2021, January 19–25). Bottleneck transformers for visual recognition. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual.