1. Training language models to follow instructions with human feedback;Ouyang;NIPS,2022
2. AudioPaLM: A Large Language Model That Can Speak and Listen;Rubenstein;arXiv preprint arXiv:2306.12925,2023
3. Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale;Le;arXiv preprint arXiv:2306.15687,2023
4. Text-to-audio generation using instruction-tuned llm and latent diffusion model;Ghosal;arXiv preprint arXiv:2304.13731,2023
5. Make-an-audio: Text-to-audio generation with prompt-enhanced diffusion models;Huang;arXiv preprint arXiv:2301.12661,2023