1. Wang Y, Stanton D, Zhang Y, Skerry-Ryan R, Battenberg E, Shor J, Xiao Y, Jia Y, Ren F, Saurous RA (2018) Style tokens: unsupervised style modeling, control and transfer in end-to-end speech synthesis. In: International conference on machine learning (PMLR), pp 5180–5189
2. Skerry-Ryan R, Battenberg E, Xiao Y, Wang Y, Stanton D, Shor J, Weiss R, Clark R, Saurous RA (2018) Towards end-to-end prosody transfer for expressive speech synthesis with Tacotron. In: International conference on machine learning (PMLR), pp 4693–4702
3. Chen M, Tan X, Li B, Liu Y, Qin T, sheng zhao, Liu TY (2021) AdaSpeech: adaptive text to speech for custom voice. In: International conference on learning representations. https://openreview.net/forum?id=Drynvt7gg4L
4. Hsu WN, Zhang Y, Weiss RJ, Zen H, Wu Y, Wang Y, Cao Y, Jia Y, Chen Z, Shen J et al (2018) Hierarchical generative modeling for controllable speech synthesis. In: International conference on learning representations
5. Hsu WN, Zhang Y, Weiss RJ, Chung YA, Wang Y, Wu Y, Glass J (2019) Disentangling correlated speaker and noise for speech synthesis via data augmentation and adversarial factorization. In: ICASSP 2019-2019 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 5901–5905