1. Sora: A review on background, technology, limitations, and opportunities of large vision models;Liu,2024
2. Hierarchical text-conditional image generation with clip latents;Ramesh,2024
3. Photorealistic text-to-image diffusion models with deep language understanding;Saharia;NeurIPS,2022
4. GLIGEN: Open-Set Grounded Text-to-Image Generation
5. Gpt-4 technical report;Achiam,2023