Semantic segmentation feature fusion network based on transformer-Reference-Cited by-同舟云学术

Semantic segmentation feature fusion network based on transformer

Published:2024-06-27 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Li Tianping¹,Cui Zhaotong¹,Zhang Hua¹

Affiliation:

1. Shandong Normal University

Abstract

Convolutional neural networks have demonstrated efficacy in acquiring local features and spatial details; however, they struggle to obtain global information, which could potentially compromise the segmentation of important regions of an image. Transformer can increase the expressiveness of pixels by establishing global relationships between them. Moreover, some transformer-based self-attentive methods do not combine the advantages of convolution, which makes the model require more computational parameters. This work uses both Transformer and CNN structures to improve the relationship between image-level regions and global information to improve segmentation accuracy and performance in order to address these two issues and improve the semantic segmentation segmentation results at the same time. We first build a Feature Alignment Module (FAM) module to enhance spatial details and improve channel representations. Second, we compute the link between similar pixels using a Transformer structure, which enhances the pixel representation. Finally, we design a Pyramid Convolutional Pooling Module (PCPM) that both compresses and enriches the feature maps, as well as determines the global correlations among the pixels, to reduce the computational burden on the transformer. These three elements come together to form a transformer-based semantic segmentation feature fusion network (FFTNet). Our method yields 82.5% mIoU, according to experimental results based on the Cityscapes test dataset. Furthermore, we conducted various visualization tests using the Pascal VOC 2012 and Cityscapes datasets. The results show that our approach outperforms alternative approaches.

Publisher

Research Square Platform LLC

Reference57 articles.

1. 1. Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 2117–2125.

2. 2. Chen Y, Zhang H, Liu L, et al. Research on image inpainting algorithm of improved total variation minimization method[J]. Journal of Ambient Intelligence and Humanized Computing, 2021: 1–10. https://doi.org/10.1007/s12652-020-02778-2

3. 3. Long J, Shelhamer E, Darrell T (2015) Fully Convolutional Networks for Semantic Segmentation.In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440

4. 4. Chen Y, Liu L, Phonevilay V, et al. Image super-resolution reconstruction based on feature map attention mechanism[J]. Applied Intelligence, 2021, 51: 4367–4380.

5. 5. Zhou B, Zhao H, Puig X, et al (2017) Scene Parsing through ADE20K Dataset. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Honolulu, HI, pp 5122–5130