Multi speaker text-to-speech synthesis using generalized end-to-end loss function-Reference-Cited by-同舟云学术

Multi speaker text-to-speech synthesis using generalized end-to-end loss function

Published:2024-01-13 Issue:24 Volume:83 Page:64205-64222
ISSN:1573-7721
Container-title:Multimedia Tools and Applications
language:en
Short-container-title:Multimed Tools Appl

Author:

Nazir Owais,Malik Aruna^ORCID,Singh Samayveer,Pathan Al-Sakib Khan

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s11042-024-18121-2.pdf

Reference34 articles.

1. Zen H, Nose T, Yamagishi J, Sako S, Masuko T, Black AW, Tokuda K (2007) The HMM-based speech synthesis system (HTS) version 2.0. SSW 6:294–299

2. Van den Oord A, Kalchbrenner N, Espeholt L, Vinyals O, Graves A (2016) Conditional image generation with pixelcnn decoders. Adv Neural Inf Process Syst 29:1–9. ArXiv, abs/1606.05328

3. Van Den Oord A, Kalchbrenner N, Kavukcuoglu K (2016) Pixel recurrent neural networks. In: International Conference on Machine Learning, MLR, pp. 1747–1756

4. Wang Y, Skerry-Ryan RJ, Stanton D, Wu Y, Weiss RJ, Jaitly N, Yang Z, Xiao Y, Chen Z, Bengio S, Le Q, Agiomyrgiannakis Y, Clark Y, Saurous RA, Saurous RA (2017) Tacotron: Towards end-to-end speech synthesis. arXiv preprint arXiv:1703.10135

5. Griffin D, Lim J (1984) Signal estimation from modified short-time Fourier transform. IEEE Trans Acoust Speech Signal Process 32(2):236–243

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Enhancing spoken dialect identification with stacked generalization of deep learning models;Multimedia Tools and Applications;2024-09-04

2. Knowledge in attention assistant for improving generalization in deep teacher–student models;International Journal of Modelling and Simulation;2024-08-22

3. Dual-branch network with fused Mel features for logic-manipulated speech detection;Applied Acoustics;2024-06