1. Nal Kalchbrenner, Erich Elsen, Karen Simonyan, Seb Noury, Norman Casagrande, Edward Lockhart, Florian Stimberg, Aäron van den Oord, Sander Dieleman, and Koray Kavukcuoglu. 2018. Efficient neural audio synthesis. In Proceedings of the 35th International Conference on Machine Learning (ICML’18). PMLR, 2415–2424.
2. STRAIGHT, exploitation of the other aspect of VOCODER: Perceptually isomorphic decomposition of speech sounds
3. Ji-Hoon Kim, Sang-Hoon Lee, Ji-Hyun Lee, and Seong-Whan Lee. 2021. Fre-GAN: Adversarial frequency-consistent audio synthesis. In Proceedings of the 22nd Annual Conference of the International Speech Communication Association (INTERSPEECH’21). ISCA, 2197–2201.
4. Jungil Kong, Jaehyeon Kim, and Jaekyoung Bae. 2020. HiFi-GAN: Generative adversarial networks for efficient and high fidelity speech synthesis. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems (NeurIPS’20).
5. Kundan Kumar, Rithesh Kumar, Thibault de Boissière, Lucas Gestin, Wei Zhen Teoh, Jose M. R. Sotelo, Alexandre de Brébisson, Yoshua Bengio, and Aaron C. Courville. 2019. MelGAN: Generative adversarial networks for conditional waveform synthesis. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems (NeurIPS’19). 14881–14892.