1. A unified model for zero-shot singing voice conversion and synthesis;Wu
2. DeepSinger: Singing Voice Synthesis with Data Mined From the Web
3. MR-SVS: Singing voice synthesis with multi-reference encoder;Wang;arXiv preprint arXiv:2201.03864,2022
4. Naturalspeech 2: Latent diffusion models are natural and zero-shot speech and singing synthesizers;Shen;CoRR,2023
5. FastSpeech: Fast, robust and controllable text to speech;Ren