Affiliation:
1. Nanjing University of Posts and Telecommunications, Nanjing, China
Abstract
Text-to-Image synthesis aims to generate an accurate and semantically consistent image from a given text description. However, it is difficult for existing generative methods to generate semantically complete images from a single piece of text. Some works try to expand the input text to multiple captions via retrieving similar descriptions of the input text from the training set but still fail to fill in missing image semantics. In this article, we propose a GAN-based approach to Imagine, Select, and Fuse for Text-to-image synthesis, named ISF-GAN. The proposed ISF-GAN contains Imagine Stage and Select and Fuse Stage to solve the above problems. First, the Imagine Stage proposes a text completion and enrichment module. This module guides a GPT-based model to enrich the text expression beyond the original dataset. Second, the Select and Fuse Stage selects qualified text descriptions and then introduces a cross-modal attentional mechanism to interact these different sentence embeddings with the image features at different scales. In short, our proposed model enriches the input text information for completing missing semantics and introduces a cross-modal attentional mechanism to maximize the utilization of enriched text information to generate semantically consistent images. Experimental results on CUB, Oxford-102, and CelebA-HQ datasets prove the effectiveness and superiority of the proposed network. Code is available at
https://github.com/Feilingg/ISF-GAN
Funder
National Natural Science Foundation of China
Key Research and Development Program of Jiangsu Province
Natural Science Research Start-up Foundation of Recruiting Talents of Nanjing University of Posts and Telecommunications
Postgraduate Research & Practice Innovation Program of Jiangsu Province
Publisher
Association for Computing Machinery (ACM)
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献