A survey of generative adversarial networks and their application in text-to-image synthesis-Reference-Cited by-同舟云学术

A survey of generative adversarial networks and their application in text-to-image synthesis

Published:2023 Issue:12 Volume:31 Page:7142-7181
ISSN:2688-1594
Container-title:Electronic Research Archive
language:
Short-container-title:era

Author:

Zeng Wu¹,Zhu Heng-liang²³,Lin Chuan⁴,Xiao Zheng-ying¹

Affiliation:

1. Engineering Training Center, Putian University, Putian 351100, China

2. College of Computer Science and Mathematics, Fujian University of Technology, Fuzhou 350118, China

3. Fujian Provincial Universities Key Laboratory of Industrial Control and Data Analysis, Fuzhou 350118, China

4. School of Mechanical, Electrical & Information Engineering, Putian University, Putian 351100, China

Abstract

<abstract><p>With the continuous development of science and technology (especially computational devices with powerful computing capabilities), the image generation technology based on deep learning has also made significant achievements. Most cross-modal technologies based on deep learning can generate information from text into images, which has become a hot topic of current research. Text-to-image (T2I) synthesis technology has applications in multiple fields of computer vision, such as image enhancement, artificial intelligence painting, games and virtual reality. The T2I generation technology using generative adversarial networks can generate more realistic and diverse images, but there are also some shortcomings and challenges, such as difficulty in generating complex backgrounds. This review will be introduced in the following order. First, we introduce the basic principles and architecture of basic and classic generative adversarial networks (GANs). Second, this review categorizes T2I synthesis methods into four main categories. There are methods based on semantic enhancement, methods based on progressive structure, methods based on attention and methods based on introducing additional signals. We have chosen some of the classic and latest T2I methods for introduction and explain their main advantages and shortcomings. Third, we explain the basic dataset and evaluation indicators in the T2I field. Finally, prospects for future research directions are discussed. This review provides a systematic introduction to the basic GAN method and the T2I method based on it, which can serve as a reference for researchers.</p></abstract>

Publisher

American Institute of Mathematical Sciences (AIMS)

Subject

General Mathematics

Reference101 articles.

1. W. Yu, M. Luo, P. Zhou, C. Si, Y. Zhou, X. Wang, et al., MetaFormer is actually what you need for vision, in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2022), 10809–10819. https://doi.org/10.1109/CVPR52688.2022.01055

2. Y. Chen, X. Dai, D. Chen, M. Liu, X. Dong, L. Yuan, et al., Mobile-former: Bridging mobilenet and transforme, in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2022), 5270–5279. https://doi.org/10.1109/CVPR52688.2022.00520

3. A. Priya, K. M. Narendra, F. Binish, S. Pushpendra, A. Gupta, S. D. Joshi, COVID-19 image classification using deep learning: Advances, challenges and opportunities, Comput. Biol. Med., 144 (2022), 105350. https://doi.org/10.1016/j.compbiomed.2022.105350

4. Y. L. Li, Research and application of deep learning in image recognition, in 2022 IEEE 2nd International Conference on Power, Electronics and Computer Applications (ICPECA), IEEE, (2022), 994–999. https://doi.org/10.1109/ICPECA53709.2022.9718847

5. H. E. Kim, A. Cosa-Linan, N. Santhanam, M. Jannesari, M. E. Maros, T. Ganslandt, Transfer learning for medical image classification: A literature review, BMC Med. Imaging, 22 (2022), 69. https://doi.org/10.1186/s12880-022-00793-7

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Image data augmentation techniques based on deep learning: A survey;Mathematical Biosciences and Engineering;2024