1. Van den Oord A, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior A, Kavukcuoglu K (2016) WaveNet: A generative model for raw audio. arXiv preprint arXiv:1609.03499. Accessed 19 Sep 2016
2. Orhan MC, Demiroğlu C (2011) HMM-based text to speech system with speaker interpolation. 2011 IEEE 19th Signal Processing and Communications Applications Conference (SIU). IEEE, pp 781–784
3. Arık SÖ, Chrzanowski M, Coates A, Diamos G, Gibiansky A, Kang Y, Li X et al (2017) Deep voice: Real-time neural text-to-speech. International Conference on Machine Learning. PMLR, pp 195–204
4. Sotelo J, Mehri S, Kumar K, Santos JF, Kastner K, Courville A, Bengio Y (2017) Char2wav: End-to-end speech synthesis. arXiv preprint arXiv:1702.07825. Accessed 7 Mar 2017
5. Mehri S, Kumar K, Gulrajani I, Kumar R, Jain S, Sotelo J, Courville A, Bengio Y (2016) SampleRNN: An unconditional end-to-end neural audio generation model. arXiv preprint arXiv:1612.07837. Accessed 11 Feb 2017