1. Wang Y, Skerry-Ryan R, Stanton D, Wu Y, Weiss RJ, Jaitly N, Yang Z, Xiao Y, Chen Z, Bengio S et al (2017) Tacotron: Towards end-to-end speech synthesis. In: Proceedings of the Interspeech 2017, pp 4006–4010
2. Shen J, Pang R, Weiss RJ, Schuster M, Jaitly N, Yang Z, Chen Z, Zhang Y, Wang Y, Skerry-Ryan R et al (2018) Natural TTS synthesis by conditioning WaveNet on mel spectrogram predictions. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 4779–4783
3. Mehri S, Kumar K, Gulrajani I, Kumar R, Jain S, Sotelo J, Courville A, Bengio Y (2017) SampleRNN: an unconditional end-to-end neural audio generation model. In: International conference on learning representations
4. Valin JM, Skoglund J (2019) LPCNet: improving neural speech synthesis through linear prediction. In: ICASSP 2019-2019 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 5891–5895
5. Ping W, Peng K, Gibiansky A, Arik SO, Kannan A, Narang S, Raiman J, Miller J (2018) Deep voice 3: 2000-speaker neural text-to-speech. In: Proceedings of the international conference on learning representations, pp 214–217