1. Hifi-gan: Generative adversarial networks for efficient and high fidelity speech synthesis;kong;NeurIPS,2020
2. Conditional variational au-toencoder with adversarial learning for end-to-end text-to-speech;kim;ICML,2021
3. The lj speech dataset;ito,2017
4. U-net: Convolutional networks for biomedical image segmentation;ronneberger;International Conference on Medical Image Computing and Computer-Assisted Intervention,2015
5. Nix-tts: An incredibly lightweight end-to-end text-to-speech model via non end-to-end distillation;chevi,2022