UNet-like network fused swin transformer and CNN for semantic image synthesis-Reference-Cited by-同舟云学术

UNet-like network fused swin transformer and CNN for semantic image synthesis

Published:2024-07-21 Issue:1 Volume:14 Page:
ISSN:2045-2322
Container-title:Scientific Reports
language:en
Short-container-title:Sci Rep

Author:

Ke Aihua,Luo Jian,Cai Bo

Abstract

AbstractSemantic image synthesis approaches has been dominated by the modelling of Convolutional Neural Networks (CNN). Due to the limitations of local perception, their performance improvement seems to have plateaued in recent years. To tackle this issue, we propose the SC-UNet model, which is a UNet-like network fused Swin Transformer and CNN for semantic image synthesis. Photorealistic image synthesis conditional on the given semantic layout depends on the high-level semantics and the low-level positions. To improve the synthesis performance, we design a novel conditional residual fusion module for the model decoder to efficiently fuse the hierarchical feature maps extracted at different scales. Moreover, this module combines the opposition-based learning mechanism and the weight assignment mechanism for enhancing and attending the semantic information. Compared to pure CNN-based models, our SC-UNet combines the local and global perceptions to better extract high- and low-level features and better fuse multi-scale features. We have conducted an extensive amount of comparison experiments, both in quantitative and qualitative terms, to validate the effectiveness of our proposed SC-UNet model for semantic image synthesis. The outcomes illustrate that SC-UNet distinctively outperforms the state-of-the-art model on three benchmark datasets (Citysacpes, ADE20K, and COCO-Stuff) including numerous real-scene images.

Funder

National Natural Science Foundation of China

Publisher

Springer Science and Business Media LLC

Link

https://www.nature.com/articles/s41598-024-65585-1.pdf

Reference75 articles.

1. Xu, H., Huang, C. & Wang, D. Enhancing semantic image retrieval with limited labeled examples via deep learning. Knowl.-Based Syst. 163, 252–266 (2019).

2. Kumar, S., Singh, M. K. & Mishra, M. Efficient deep feature based semantic image retrieval. Neural Process. Lett. 1–24 (2023).

3. Hua, C.-H., Huynh-The, T., Bae, S.-H. & Lee, S. Cross-attentional bracket-shaped convolutional network for semantic image segmentation. Inf. Sci. 539, 277–294 (2020).

4. Fan, Z. et al. Self-attention neural architecture search for semantic image segmentation. Knowl.-Based Syst. 239, 107968 (2022).

5. Ma, Y., Yu, L., Lin, F. & Tian, S. Cross-scale sampling transformer for semantic image segmentation. J. Intell. Fuzzy Syst. 1–13 (2023).

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Text-guided image-to-sketch diffusion models;Knowledge-Based Systems;2024-11