1. A. van den Oord, Dieleman S., Zen H., Simonyan K., Vinyals O., Graves A., Kalchbrenner N., Senior A., Kavukcuoglu K. WaveNet: A generative model for raw audio, arXiv:1609.03499, 2016.
2. Shen J., Pang R., Weiss R. J., Schuster M., Jaitly N., Yang Z., Chen Z., Zhang Y., Wang Y., Skerrv-Ryan R. “Natural TTS synthesis by conditioning WaveNet on mel spectrogram predictions,” in Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018:4779-4783.
3. Arik S., Diamos G., Gibiansky A., Miller J., Peng K., Ping W., Raiman J., and Zhou Y. Deep voice 2: Multi-speaker neural text-to-speech. arXiv:1705.08947, 2017.
4. Valin J.-V., Skoglund J. LPCNet: Improving neural speech synthesis through linear prediction, arXiv:1810.11846
5. Griffin D., Lim J. A new model-based speech analysis/synthesis system. In Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 1985;10:513-516.