iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis Based on Disentanglement Between Prosody and Timbre-Reference-Cited by-同舟云学术

iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis Based on Disentanglement Between Prosody and Timbre

Author:

Zhang Guangyan¹^ORCID,Qin Ying²^ORCID,Zhang Wenjie¹,Wu Jialun¹,Li Mei¹,Gai Yutao¹^ORCID,Jiang Feijun¹,Lee Tan³^ORCID

Affiliation:

1. Intelligent Connectivity, Cloud & Technology, Alibaba Group, Hangzhou, China

2. Institute of Information Science, Beijing Jiaotong University, Beijing, China

3. DSP & Speech Technology Laboratory, Department of Electronic Engineering, The Chinese University of Hong Kong, Hong Kong

Funder

Alibaba Group through Alibaba Research Intern Program

Fundamental Research Funds for the Central Universities

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Subject

Electrical and Electronic Engineering,Acoustics and Ultrasonics,Computer Science (miscellaneous),Computational Mathematics

Link

Reference64 articles.

3. Emotional end-to-end neural speech synthesizer;lee;Proc Adv Neural Inf Process Syst Neural Inf Process Syst Found,0

4. The concrete distribution: A continuous relaxation of discrete random variables;maddison;Proc Int Conf Learn Representations,0

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

2. A Survey on Voice Cloning and Automated Video Dubbing Systems;2024 International Conference on Wireless Communications Signal Processing and Networking (WiSPNET);2024-03-21

5. Zero-Shot Emotion Transfer for Cross-Lingual Speech Synthesis;2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU);2023-12-16