1. Alencar, V., & Alcaim, A. (2008). LSF and lPC-derived features for large vocabulary distributed continuous speech recognition in Brazilian Portuguese. In 2008 42nd
Asilomar conference on signals, systems and computers (pp. 1237–1241). IEEE.
2. Arik, S. O., Chrzanowski, M., Coates, A., Diamos, G., Gibiansky, A., Kang, Y., Li, X., Miller, J., Raiman, J., & Sengupta, S., & Ng, A. (2017). Deep voice: Real-time neural text-to-speech. arXiv preprint. http://arxiv.org/abs/170207825
3. Arık, S. O., Diamos, G., Gibiansky, A., Miller, J., Peng, K., Ping, W., Raiman, J., & Zhou, Y. (2017). Deep voice 2: Multi-speaker neural text-to-speech. arXiv preprint. http://arxiv.org/abs/170508947
4. Aroon, A., & Dhonde, S. (2015). Statistical parametric speech synthesis: A review. In 2015 IEEE 9th international conference on intelligent systems and control (ISCO) (pp. 1–5). IEEE.
5. Ba, J. L., Kiros, J. R., & Hinton, G. E. (2016). Layer normalization. arXiv preprint. http://arxiv.org/abs/160706450