Stylespeech: Self-Supervised Style Enhancing with VQ-VAE-Based Pre-Training for Expressive Audiobook Speech Synthesis-Reference-Cited by-同舟云学术

Stylespeech: Self-Supervised Style Enhancing with VQ-VAE-Based Pre-Training for Expressive Audiobook Speech Synthesis

Published:2024-04-14 Issue: Volume:30 Page:12316-12320
ISSN:
Container-title:ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
language:
Short-container-title:

Author:

Chen Xueyuan¹,Wang Xi²,Zhang Shaofei²,He Lei²,Wu Zhiyong¹,Wu Xixin¹,Meng Helen¹

Affiliation:

1. The Chinese University of Hong Kong,Department of Systems Engineering and Engineering Management,Hong Kong SAR,China

2. Microsoft,Beijing,China

Funder

National Natural Science Foundation of China

Publisher

IEEE

Link

Reference24 articles.

3. Fastspeech 2: Fast and high-quality end-to-end text to speech;Ren

4. Style tokens: Unsupervised style modeling, control and transfer in end-to-end speech synthesis;Wang

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

2. Exploiting Audio-Visual Features with Pretrained AV-HuBERT for Multi-Modal Dysarthric Speech Reconstruction;ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2024-04-14