Affiliation:
1. Faculty of Robot Science and Engineering, Northeastern University, Shenyang 110819, China
Abstract
Transformers have demonstrated a significant advantage over CNNs in modeling long-range dependencies, leading to increasing attention being paid towards their application in semantic segmentation tasks. In the present work, a novel semantic segmentation model, LACTNet, is introduced, which synergistically combines Transformer and CNN architectures for the real-time processing of local and global contextual features. LACTNet is designed with a lightweight Transformer, which integrates a specially designed gated convolutional feedforward network, to establish feature dependencies across distant regions. A Lightweight Average Feature Bottleneck (LAFB) module is designed to effectively capture spatial detail information within the features, thereby enhancing segmentation accuracy. To address the issue of spatial feature loss in the decoder, a long skip-connection approach is employed through the designed Feature Fusion Enhancement Module (FFEM), which enhances the integrity of spatial features and the feature interaction capability in the decoder. LACTNet is evaluated on two datasets, achieving a segmentation accuracy of 74.8% mIoU and a frame rate of 90 FPS on the Cityscapes dataset, and a segmentation accuracy of 71.8% mIoU with a frame rate of 126 FPS on the CamVid dataset.
Funder
National Natural Science Foundation of China
Natural Science Foundation of Liaoning Province
National Key R&D Program Project of China
Fundamental Research Funds for the Central Universities
Reference61 articles.
1. Long, J., Shelhamer, E., and Darrell, T. (2015, January 7–12). Fully convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.
2. Poudel, R.P., Liwicki, S., and Cipolla, R. (2019). Fast-scnn: Fast semantic segmentation network. arXiv.
3. Segnet: A deep convolutional encoder-decoder architecture for image segmentation;Badrinarayanan;IEEE Trans. Pattern Anal. Mach. Intell.,2017
4. Peng, C., Zhang, X., Yu, G., Luo, G., and Sun, J. (2017, January 21–26). Large kernel matters—Improve semantic segmentation by global convolutional network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.
5. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs;Chen;IEEE Trans. Pattern Anal. Mach. Intell.,2017