1. Ardila, R., et al.: Common voice: a massively-multilingual speech corpus. In: Proceedings of the Twelfth Language Resources and Evaluation Conference, pp. 4218–4222. European Language Resources Association, Marseille, France, May 2020. https://aclanthology.org/2020.lrec-1.520
2. Casanova, E., et al.: TTS-Portuguese Corpus: a corpus for speech synthesis in Brazilian Portuguese. Lang. Resour. Eval. 1–13 (2022)
3. Casanova, E., et al.: SC-GlowTTS: an efficient zero-shot multi-speaker text-to-speech model (2021). https://doi.org/10.48550/ARXIV.2104.05557, https://arxiv.org/abs/2104.05557
4. Casanova, E., Weber, J., Shulby, C.D., Junior, A.C., Gölge, E., Ponti, M.A.: YourTTS: towards zero-shot multi-speaker TTS and zero-shot voice conversion for everyone. In: International Conference on Machine Learning, pp. 2709–2720. PMLR (2022)
5. Chung, J.S., et al.: In defence of metric learning for speaker recognition. arXiv preprint arXiv:2003.11982 (2020)