1. Angelini, O., Moinet, A., Yanagisawa, K., Drugman, T., 2020. Singing Synthesis: With a Little Help from my Attention. In: Proc. Interspeech.
2. Baevski, A., Zhou, Y., Mohamed, A., Auli, M., 2020. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations. In: Proc. NeurIPS. Vol. 33.
3. Bak, T., Bae, J.-S., Bae, H., Kim, Y.-I., Cho, H.-Y., 2021. FastPitchFormant: Source-Filter Based Decomposed Modeling for Speech Synthesis. In: Proc. Interspeech. pp. 116–120.
4. Effective use of variational embedding capacity in expressive end-to-end speech synthesis;Battenberg,2019
5. Location-relative attention mechanisms for robust long-form speech synthesis;Battenberg,2020