Affiliation:
1. Bengal Institute of Technology, India
2. GIET University, India
Abstract
The intersection of computer vision and natural language processing (NLP) has witnessed significant advancements in recent research, particularly in the realm of converting text into meaningful images leveraging generative AI and large language models. This review work aims to comprehensively review the progress made in text-to-image conversion. The survey covers the three primary approaches in the field, namely diffusion models (DM), GAN model approaches, and autoregressive approaches. Furthermore, the authors present a comprehensive chronology of the TIG journey, encompassing its origin and the most recent developments, providing readers with a comprehensive perspective on the field's progression. The survey focuses heavily on identifying the existing constraints of DM in picture production and offers multiple research publications and their contributions in overcoming these constraints. The survey provides useful insights into the advancements in text-to-image (TIG) generation using generative AI by focusing on key difficulties and examining how different works have addressed them.
Reference107 articles.
1. Pros and cons of GAN evaluation measures
2. Brock, A., Donahue, J., & Simonyan, K. (2018). Large scale GAN training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096.
3. Language models are few-shot learners.;T.Brown;Advances in Neural Information Processing Systems,2020
4. A Survey on Generative Diffusion Models
5. Chan, E. R., Lin, C. Z., Chan, M. A., Nagano, K., Pan, B., De Mello, S., & Wetzstein, G. (2022). Efficient geometry-aware 3d generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 16123-16133). IEEE.