1. van den Oord A, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior A, Kavukcuoglu K (2016) WaveNet: a generative model for raw audio. Preprint. arXiv:1609.03499
2. Kalchbrenner N, Elsen E, Simonyan K, Noury S, Casagrande N, Lockhart E, Stimberg F, Oord A, Dieleman S, Kavukcuoglu K (2018) Efficient neural audio synthesis. In: International conference on machine learning. PMLR, pp 2410–2419
3. Liu Y, Xue R, He L, Tan X, Zhao S (2022) DelightfulTTS 2: End-to-end speech synthesis with adversarial vector-quantized auto-encoders. Preprint. arXiv:2207.04646
4. Cong J, Yang S, Xie L, Su D (2021) Glow-WaveGAN: learning speech representations from GAN-based variational auto-encoder for high fidelity flow-based speech synthesis. Preprint. arXiv:2106.10831
5. Hayashi T, Watanabe S (2020) DiscreTalk: Text-to-speech as a machine translation problem. Preprint. arXiv:2005.05525