Affiliation:
1. College of Computer Science and Technology Jilin University Changchun China
2. College of Computer Science and Engineering Changchun University of Technology Changchun China
Abstract
AbstractDue to the large computational and GPUs memory cost of semantic segmentation, some works focus on designing a lite weight model to achieve a good trade‐off between computational cost and accuracy. A common method is to combined CNN and vision transformer. However, these methods ignore the contextual information of multi receptive fields. And existing methods often fail to inject detailed information losses in the downsampling of multi‐scale feature. To fix these issues, we propose AG Self‐Attention, which is Enhanced Atrous Self‐Attention (EASA), and Gate Attention. AG Self‐Attention adds the contextual information of multi receptive fields into the global semantic feature. Specifically, the Enhanced Atrous Self‐Attention uses weight shared atrous convolution with different atrous rates to get the contextual information under the specific different receptive fields. Gate Attention introduces gating mechanism to inject detailed information into the global semantic feature and filter detailed information by producing “fusion” gate and “update” gate. In order to prove our insight. We conduct numerous experiments in common semantic segmentation datasets, consisting of ADE20 K, COCO‐stuff, PASCAL Context, Cityscapes, to show that our method achieves state‐of‐the‐art performance and achieve a good trade‐off between computational cost and accuracy.
Publisher
Institution of Engineering and Technology (IET)
Subject
Computer Vision and Pattern Recognition,Software
Reference52 articles.
1. Per‐pixel classification is not all you need for semantic segmentation;Cheng B.;J. Inf. Process. Syst.,2017
2. SegFormer: simple and efficient design for semantic segmentation with transformers;Xie E.;J. Inf. Process. Syst.,2021
3. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs