Speaker Adaptation Experiments with Limited Data for End-to-End Text-To-Speech Synthesis using Tacotron2-Reference-Cited by-同舟云学术

Speaker Adaptation Experiments with Limited Data for End-to-End Text-To-Speech Synthesis using Tacotron2

Published:2022 Issue:3 Volume:14 Page:55-62
ISSN:2061-2079
Container-title:Infocommunications journal
language:
Short-container-title:Infocommunications journal

Author:

Mandeel Ali Raheem,Al-Radhi Mohammed Salah,Csapó Tamás Gábor

Abstract

Speech synthesis has the aim of generating humanlike speech from text. Nowadays, with end-to-end systems, highly natural synthesized speech can be achieved if a large enough dataset is available from the target speaker. However, often it would be necessary to adapt to a target speaker for whom only a few training samples are available. Limited data speaker adaptation might be a difficult problem due to the overly few training samples. Issues might appear with a limited speaker dataset, such as the irregular allocation of linguistic tokens (i.e., some speech sounds are left out from the synthesized speech). To build lightweight systems, measuring the number of minimum data samples and training epochs is crucial to acquire a reasonable quality. We conducted detailed experiments with four target speakers for adaptive speaker text-to-speech (TTS) synthesis to show the performance of the end-to-end Tacotron2 model and the WaveGlow neural vocoder with an English dataset at several training data samples and training lengths. According to our investigation of objective and subjective evaluations, the Tacotron2 model exhibits good performance in terms of speech quality and similarity for unseen target speakers at 100 sentences of data (pair of text and audio) with a relatively low training time.

Publisher

Infocommunications Journal

Subject

Electrical and Electronic Engineering,General Computer Science

Link

https://www.infocommunications.hu/documents/169298/4805811/InfocomJ_2022_3_7_Mandeel.pdf

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Effects of Training Strategies and the Amount of Speech Data on the Quality of Speech Synthesis;Lecture Notes in Computer Science;2024

2. Enhancing End-to-End Speech Synthesis by Modeling Interrogative Sentences with Speaker Adaptation;2023 International Conference on Speech Technology and Human-Computer Dialogue (SpeD);2023-10-25

3. Modeling Irregular Voice in End-to-End Speech Synthesis via Speaker Adaptation;2023 International Conference on Speech Technology and Human-Computer Dialogue (SpeD);2023-10-25

4. A Smart Control System for the Oil Industry Using Text-to-Speech Synthesis Based on IIoT;Electronics;2023-08-08