Affiliation:
1. School of Computer Science and Engineering, Anhui University of Science and Technology, Huainan 232001, China
2. China Telecom Co., Ltd., Anhui Branch, Hefei 230000, China
Abstract
With the swift progress of deep learning and its wide application in semantic segmentation, the effect of semantic segmentation has been significantly improved. However, how to achieve a reasonable compromise between accuracy, model size, and inference speed is crucial. In this paper, we propose a lightweight multi-scale asymmetric encoder–decoder network (LMANet) that is designed on the basis of an encoder–decoder structure. First, an optimized bottleneck module is used to extract features from different levels, and different receptive fields are applied to obtain effective information on different scales. Then, a channel-attention module and a feature-extraction module are introduced to constitute the residual structure, and different feature maps are connected by a feature-fusion module to effectively improve segmentation accuracy. Finally, a lightweight multi-scale decoder is designed to recover the image, and a spatial attention module is added to recover the spatial details effectively. This paper has verified the proposed method on the Cityscapes dataset and CamVid dataset and achieved mean intersection over union (mIoU) of 73.9% and 71.3% with the inference speeds of 111 FPS and 118 FPS, respectively, and the number of parameters is only 0.85 M.
Funder
the National Key Research and Development Project
the Key Teaching Research Project of Anhui province
Reference52 articles.
1. Deep multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and challenges;Feng;IEEE Trans. Intell. Transp. Syst.,2020
2. Two path gland segmentation algorithm of colon pathological image based on local semantic guidance;Ding;IEEE J. Biomed. Health Inform.,2023
3. Multiscale location attention network for building and water segmentation of remote sensing image;Dai;IEEE Trans. Geosci. Remote Sens.,2023
4. Long, J., Shelhamer, E., and Darrell, T. (2015, January 7–12). Fully convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.
5. Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017, January 21–26). Pyramid scene parsing network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.