1. Review of Text-to-speech Conversion for English;Klatt;J. Acoust. Soc. Am.,1987
2. Reducing the Dimensionality of Data with Neural Networks;Hinton;Science,2006
3. Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A., and Kavukcuoglu, K. (2016). Wavenet: A Generative Model for Raw Audio. arXiv.
4. Sotelo, J., Mehri, S., Kumar, K., Santos, J.F., Kastner, K., Courville, A., and Bengio, Y. (2017, January 24–26). Char2wav: End-to-end Speech Synthesis. Proceedings of the 5th International Conference on Learning Representations, Toulon, France.
5. Mehri, S., Kumar, K., Gulrajani, I., Kumar, R., Jain, S., Sotelo, J., Courville, A., and Bengio, Y. (2016). SampleRNN: An Unconditional End-to-End Neural Audio Generation Model. arXiv.