1. Arik, S.Ö., et al.: Deep voice: real-time neural text-to-speech. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 195–204. JMLR.org (2017)
2. Wang, Y., et al.: Tacotron: towards end-to-end speech synthesis. arXiv preprint
arXiv:1703.10135
(2017)
3. Ping, W.: Deep voice 3: scaling text-to-speech with convolutional sequence learning. arXiv preprint
arXiv:1710.07654
(2017)
4. van den Oord, A., et al.: Wavenet: a generative model for raw audio. arXiv preprint
arXiv:1609.03499
(2016)
5. Salza, P.L., Foti, E., Nebbia, L., Oreglia, M.: MOS and pair comparison combined methods for quality evaluation of text-to-speech systems. Acta Acust. United Acust. 82(4), 650–656 (1996)