THE VNPT-IT EMOTION TRANSPLANTATION APPROACH FOR VLSP 2022

Author:

Nguyen Van Thang,Luong Thanh Long,Vu Huan

Abstract

Emotional speech synthesis is a challenging task in speech processing. To build an emotional Text-to-speech (TTS) system, one would need to have a quality emotional dataset of the target speaker. However, collecting such data is difficult, sometimes even impossible. This paper presents our approach that addresses the problem of transplanting a source speaker's emotional expression to a target speaker, one of the Vietnamese Language and Speech Processsing (VLSP) 2022 TTS tasks. Our approach includes a complete data pre-processing pipeline and two training algorithms. We first train a source speaker's expressive TTS model, then adapt the voice characteristics for the target speaker. Empirical results have shown the efficacy of our method in generating the expressive speech of a speaker under a limited training data regime.

Publisher

Publishing House for Science and Technology, Vietnam Academy of Science and Technology (Publications)

Subject

Industrial and Manufacturing Engineering,Environmental Engineering

Reference15 articles.

1. Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bah- ¨danau, and Yoshua Bengio. 2014. On the properties of neural machine translation: Encoder–decoder approaches. In Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, pages 103–111, Doha, Qatar. Association for Computational Linguistics.

2. Xiang Hao, Xiangdong Su, Radu Horaud, and Xiaofei Li. 2021. Fullsubnet: A full-band and sub-band fusion model for real-time single channel speech enhancement. ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6633–6637.

3. Young-Sun Joo, Hanbin Bae, Young-Ik Kim, HoonYoung Cho, and Hong-Goo Kang. 2020. Effective emotion transplantation in an end-to end textto-speech system. IEEE Access, 8:161713–161719.

4. Jaehyeon Kim, Jungil Kong, and Juhee Son. 2021. Conditional variational autoencoder with adversarial learning for end-to-end text-to-speech. In International Conference on Machine Learning, pages 5530–5540. PMLR.

5. Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. CoRR, abs/1412.6980.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3