Author:
B. de M. M. Marques Leonardo,Ueda Lucas H.,Simões Flávio O.,Uliani Neto Mário,Runstein Fernando O.,Nagle Edson J.,Bó Bianca Dal,Costa Paula D. P.
Publisher
Springer International Publishing
Reference35 articles.
1. Aggarwal, V., Cotescu, M., Prateek, N., Lorenzo-Trueba, J., Barra-Chicote, R.: Using VAEs and normalizing flows for one-shot text-to-speech synthesis of expressive speech. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6179–6183 (2020). https://doi.org/10.1109/ICASSP40776.2020.9053678
2. Aylett, M.P., Clark, L., Cowan, B.R., Torre, I.: Building and designing expressive speech synthesis. In: The Handbook on Socially Interactive Agents: 20 years of Research on Embodied Conversational Agents, Intelligent Virtual Agents, and Social Robotics Volume 1: Methods, Behavior, Cognition, pp. 173–212. Association for Computing Machinery, New York (2021). https://doi.org/10.1145/3477322
3. Chen, N., Zhang, Y., Zen, H., Weiss, R.J., Norouzi, M., Chan, W.: WAVEGRAD: estimating gradients for waveform generation (2020). https://doi.org/10.48550/ARXIV.2009.00713
4. Chen, Z., et al.: InferGrad: improving diffusion models for vocoder by considering inference in training (2022). https://doi.org/10.48550/ARXIV.2202.03751
5. Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis. In: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems, vol. 34, pp. 8780–8794. Curran Associates, Inc. (2021). https://proceedings.neurips.cc/paper/2021/file/49ad23d1ec9fa4bd8d77d02681df5cfa-Paper.pdf