A comparison of multi-style DNN-based TTS approaches using small datasets-Reference-Cited by-同舟云学术

A comparison of multi-style DNN-based TTS approaches using small datasets

Published:2018 Issue: Volume:161 Page:03005
ISSN:2261-236X
Container-title:MATEC Web of Conferences
language:
Short-container-title:MATEC Web Conf.

Author:

Suzić Siniša,Delić Tijana,Jovanović Vladimir,Sečujski Milan,Pekar Darko,Delić Vlado

Abstract

Studies have shown that people already perceive the interaction with computers, robots and media in the same way as they perceive social communication with other people. For that reason it is critical for a high-quality text-to-speech system (TTS) to sound as human-like as possible. However, a major obstacle in creating expressive TTS voices is that the amount of style-specific speech needed for training such a system is often not sufficient. This paper presents a comparison between different approaches to multi-style TTS, with focus on cases when only a small dataset per style is available. The described approaches have been originally proposed for efficient modelling of multiple speakers with a limited amount of data per speaker. Among the suggested approaches the approach based on style codes has emerged as the best, regardless of the target speech style.

Publisher

EDP Sciences

Subject

General Medicine

Link

https://www.matec-conferences.org/10.1051/matecconf/201816103005/pdf

Reference21 articles.

1. Csapo A.et al, Cognitive Infocommunications, IEEE 3rd International Conference, 667-672 (2012)

2. Abe M., Progress in speech synthesis, 495-510 (1997)

3. Brave S., Clifford N., The human-computer interaction handbook, 94-109. (2002)

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Explicit Control of the Level of Expressiveness in DNN-Based Speech Synthesis by Embedding Interpolation;Speech and Computer;2021

2. DNN Based Expressive Text-to-Speech with Limited Training Data;2019 27th Telecommunications Forum (TELFOR);2019-11

3. Style Transplantation in Neural Network-based Speech Synthesis;Acta Polytechnica Hungarica;2019-08-27

4. Toward More Expressive Speech Communication in Human-Robot Interaction;Lecture Notes in Computer Science;2018