Multi-model Style-aware Diffusion Learning for Semantic Image Synthesis

Author:

Niu Yunfang1ORCID,Wu Lingxiang1ORCID,Zhang Yufeng1ORCID,Zhu Yousong1ORCID,Zhu Guibo2ORCID,Wang Jinqiao3ORCID

Affiliation:

1. Foundation Model Research Center, Institute of Automation, Chinese Academy of Sciences, China and School of Artificial Intelligence, University of Chinese Academy of Sciences, China

2. Foundation Model Research Center, Institute of Automation, Chinese Academy of Sciences, China, School of Artificial Intelligence, University of Chinese Academy of Sciences, China, and Shanghai Artificial Intelligence Laboratory, China

3. Foundation Model Research Center, Institute of Automation, Chinese Academy of Sciences, China, School of Artificial Intelligence, University of Chinese Academy of Sciences, China, and Peng Cheng Laboratory, China

Abstract

Semantic image synthesis aims to generate images from given semantic layouts, which is a challenging task that requires training models to capture the relationship between layouts and images. Previous works are usually based on Generative Adversarial Networks (GAN) or autoregressive (AR) models. However, the GAN model’s training process is unstable and the AR model’s performance is seriously affected by the independent image encoder and the unidirectional generation bias. Due to the above limitations, these methods tend to synthesize unrealistic, poorly aligned images and only consider single-style image generation. In this paper, we propose a Multi-model Style-aware Diffusion Learning (MSDL) framework for semantic image synthesis, including a training module and a sampling module. In the training module, a layout-to-image model is introduced to transfer the learned knowledge from a model pretrained with massive weak correlated text-image pairs data, making the training process more efficient. In the sampling module, we design a map-guidance technique and creatively design a multi-model style-guidance strategy for creating images in multiple styles, e.g. oil painting, Disney Cartoon, and pixel style. We evaluate our method on Cityscapes, ADE20K, and COCO-Stuff, making visual comparisons and computing with multiple metrics such as FID, LPIPS, etc. Experimental results demonstrate that our model is highly competitive, especially in terms of fidelity and diversity.

Publisher

Association for Computing Machinery (ACM)

Reference60 articles.

1. Stephan Alaniz, Thomas Hummel, and Zeynep Akata. 2022. Semantic Image Synthesis with Semantically Coupled VQ-Model. In ICLR Workshop on Deep Generative Models for Highly Structured Data.

2. Structured denoising diffusion models in discrete state-spaces;Austin Jacob;Proc. Adv. Neural Inf. Process. Syst.,2021

3. Fan Bao, Chongxuan Li, Jun Zhu, and Bo Zhang. 2022. Analytic-dpm: an analytic estimate of the optimal reverse variance in diffusion probabilistic models. arXiv preprint arXiv:2201.06503 (2022).

4. Georgios Batzolis Jan Stanczuk Carola-Bibiane Schönlieb and Christian Etmann. 2021. Conditional Image Generation with Score-Based Diffusion Models. https://doi.org/10.48550/ARXIV.2111.13606

5. COCO-Stuff: Thing and Stuff Classes in Context

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3