1. Deep voice 3: 2000-speaker neural text-to-speech;wei;Proc ICLR,0
2. Durian: Duration informed attention network for multimodal synthesis;yu;ArXiv Preprint,2019
3. Fastspeech 2: Fast and high-quality end-to-end text-to-speech;ren;ArXiv Preprint,2020
4. Neural Speech Synthesis with Transformer Network
5. Glow-tts: A generative flow for text-to-speech via monotonic alignment search;kim;Advances in neural information processing systems,2020