THE VNPT-IT EMOTION TRANSPLANTATION APPROACH FOR VLSP 2022-Reference-Cited by-同舟云学术

THE VNPT-IT EMOTION TRANSPLANTATION APPROACH FOR VLSP 2022

Published:2023-11-21 Issue: Volume: Page:369-379
ISSN:2815-5939
Container-title:Journal of Computer Science and Cybernetics
language:
Short-container-title:JCC

Author:

Nguyen Van Thang,Luong Thanh Long,Vu Huan

Abstract

Emotional speech synthesis is a challenging task in speech processing. To build an emotional Text-to-speech (TTS) system, one would need to have a quality emotional dataset of the target speaker. However, collecting such data is difficult, sometimes even impossible. This paper presents our approach that addresses the problem of transplanting a source speaker's emotional expression to a target speaker, one of the Vietnamese Language and Speech Processsing (VLSP) 2022 TTS tasks. Our approach includes a complete data pre-processing pipeline and two training algorithms. We first train a source speaker's expressive TTS model, then adapt the voice characteristics for the target speaker. Empirical results have shown the efficacy of our method in generating the expressive speech of a speaker under a limited training data regime.

Publisher

Publishing House for Science and Technology, Vietnam Academy of Science and Technology (Publications)

Subject

Industrial and Manufacturing Engineering,Environmental Engineering

Reference15 articles.

1. Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bah- ¨danau, and Yoshua Bengio. 2014. On the properties of neural machine translation: Encoder–decoder approaches. In Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, pages 103–111, Doha, Qatar. Association for Computational Linguistics.

2. Xiang Hao, Xiangdong Su, Radu Horaud, and Xiaofei Li. 2021. Fullsubnet: A full-band and sub-band fusion model for real-time single channel speech enhancement. ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6633–6637.

3. Young-Sun Joo, Hanbin Bae, Young-Ik Kim, HoonYoung Cho, and Hong-Goo Kang. 2020. Effective emotion transplantation in an end-to end textto-speech system. IEEE Access, 8:161713–161719.

4. Jaehyeon Kim, Jungil Kong, and Juhee Son. 2021. Conditional variational autoencoder with adversarial learning for end-to-end text-to-speech. In International Conference on Machine Learning, pages 5530–5540. PMLR.

5. Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. CoRR, abs/1412.6980.