Affiliation:
1. School of Information and Electrical Engineering, Hebei University of Engineering, No.19 Taiji Road, Handan, Hebei 056038, China
2. Hebei Key Laboratory of Security & Protection Information Sensing and Processing, No.19 Taiji Road, Handan, Hebei 056038, China
Abstract
Currently, numerous high-precision models have been proposed for semantic segmentation, but the model parameters are large and the segmentation speed is slow. Real-time semantic segmentation for urban scenes necessitates a balance between accuracy, inference speed, and model size. In this paper, we present an efficient solution to this challenge, efficient asymmetric attention module net (EAAMNet) for the semantic segmentation of urban scenes, which adopts an asymmetric encoder–decoder structure. The encoder part of the network utilizes an efficient asymmetric attention module to form the network backbone. In the decoding part, we propose a lightweight multi-feature fusion decoder that can maintain good segmentation accuracy with a small number of parameters. Our extensive evaluations demonstrate that EAAMNet achieves a favorable equilibrium between segmentation efficiency, model parameters, and segmentation accuracy, rendering it highly suitable for real-time semantic segmentation in urban scenes. Remarkably, EAAMNet attains a 73.31% mIoU at 128 fps on Cityscapes and a 69.32% mIoU at 141 fps on CamVid without any pre-training. Compared to state-of-the-art models, our approach not only matches their model parameters but also enhances accuracy and increases speed.
Publisher
Fuji Technology Press Ltd.
Reference46 articles.
1. J. Long, E. Shelhamer, and T. Darrell, “Fully Convolutional Networks for Semantic Segmentation,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 3431-3440, 2015. https://doi.org/10.1109/CVPR.2015.7298965
2. H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid Scene Parsing Network,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 2881-2890, 2017.
3. O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” 18th Int. Conf. on Medical Image Computing and Computer-Assisted Intervention (MICCAI 2015), pp. 234-241, 2015. https://doi.org/10.1007/978-3-319-24574-4_28
4. L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs,” arXiv:1412.7062, 2014. https://doi.org/10.48550/arXiv.1412.7062
5. L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.40, No.4, pp. 834-848, 2017. https://doi.org/10.1109/TPAMI.2017.2699184