1. Ashual, O., & Wolf, L. (2019). Specifying object attributes and relations in interactive scene generation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 4561–4569).
2. Ediffi: Text-to-image diffusion models with an ensemble of expert denoisers;Balaji,2022
3. Large scale GAN training for high fidelity natural image synthesis;Brock,2018
4. Modulating early visual processing by language;De Vries;Advances in Neural Information Processing Systems,2017
5. Cogview: Mastering text-to-image generation via transformers;Ding;Advances in Neural Information Processing Systems,2021