Optimizing and interpreting the latent space of the conditional text-to-image GANs

Author:

Zhang ZhenxingORCID,Schomaker Lambert

Abstract

AbstractText-to-image generation intends to automatically produce a photo-realistic image, conditioned on a textual description. To facilitate the real-world applications of text-to-image synthesis, we focus on studying the following three issues: (1) How to ensure that generated samples are believable, realistic or natural? (2) How to exploit the latent space of the generator to edit a synthesized image? (3) How to improve the explainability of a text-to-image generation framework? We introduce two new data sets for benchmarking, i.e., the Good & Bad, bird and face, data sets consisting of successful as well as unsuccessful generated samples. This data set can be used to effectively and efficiently acquire high-quality images by increasing the probability of generating Good latent codes with a separate, new classifier. Additionally, we present a novel algorithm which identifies semantically understandable directions in the latent space of a conditional text-to-image GAN architecture by performing independent component analysis on the pre-trained weight values of the generator. Furthermore, we develop a background-flattening loss (BFL), to improve the background appearance in the generated images. Subsequently, we introduce linear-interpolation analysis between pairs of text keywords. This is extended into a similar triangular ‘linguistic’ interpolation. The visual array of interpolation results gives users a deep look into what the text-to-image synthesis model has learned within the linguistic embeddings. Experimental results on the recent DiverGAN generator, pre-trained on three common benchmark data sets demonstrate that our classifier achieves a better than 98% accuracy in predicting Good/Bad classes for synthetic samples and our proposed approach is able to derive various interpretable semantic properties for the text-to-image GAN model.

Publisher

Springer Science and Business Media LLC

Subject

Artificial Intelligence,Software

Cited by 3 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Foundations of Generative AI;Advances in Computational Intelligence and Robotics;2024-06-28

2. Semantic Distance Adversarial Learning for Text-to-Image Synthesis;IEEE Transactions on Multimedia;2024

3. Optimizing and interpreting the latent space of the conditional text-to-image GANs;Neural Computing and Applications;2023-11-21

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3