VITS, Tacotron or FastSpeech? Challenging Some of the Most Popular Synthesizers-Reference-Cited by-同舟云学术

VITS, Tacotron or FastSpeech? Challenging Some of the Most Popular Synthesizers

Published:2023 Issue: Volume: Page:322-335
ISSN:0302-9743
Container-title:Lecture Notes in Computer Science
language:
Short-container-title:

Author:

Matoušek Jindřich^ORCID,Tihelka Daniel^ORCID,Tihelková Alice^ORCID

Publisher

Springer Nature Switzerland

Link

https://link.springer.com/content/pdf/10.1007/978-3-031-47665-5_26

Reference37 articles.

1. Beerends, J., et al.: Perceptual objective listening quality assessment (POLQA), the third generation ITU-T standard for end-to-end speech quality measurement part I-temporal alignment. AES: J. Audio Eng. Soc. 61, 366–384 (2013)

2. Casanova, E., Weber, J., Shulby, C., Junior, A.C., Gölge, E., Ponti, M.A.: YourTTS: towards zero-shot multi-speaker TTS and zero-shot voice conversion for everyone (2021)

3. Cho, H., Jung, W., Lee, J., Woo, S.H.: SANE-TTS: stable and natural end-to-end multilingual text-to-speech. In: Ko, H., Hansen, J.H.L. (eds.) 23rd Annual Conference of the International Speech Communication Association, Interspeech 2022, Incheon, Korea, 18–22 September 2022, pp. 1–5. ISCA (2022). https://doi.org/10.21437/Interspeech.2022-46

4. Delalez, S., Akue, L.: Neural TTS in French: comparing graphemic and phonetic inputs using the SynPaFlex-Corpus and Tacotron2 (2023)

5. Elias, I., et al.: Parallel Tacotron 2: a non-autoregressive neural TTS model with differentiable duration modeling. In: Proceedings of the Interspeech 2021, pp. 141–145 (2021). https://doi.org/10.21437/Interspeech.2021-1461

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Sentences vs Phrases in Neural Speech Synthesis;Lecture Notes in Computer Science;2024