Affiliation:
1. Pohang University of Science and Technology
2. Catholic University of Daegu
Abstract
This article presents a prosodic phrasing model for a general purpose Korean speech synthesis system. To reflect the factors affecting prosodic phrasing in the model, linguistically motivated machine-learning features were investigated. These features were effectively incorporated using a stacking model. The phrasing performance was also improved through feature engineering. The corpus used in the experiment is a 4,392-sentence corpus (55,015 words with an average of 13 words per sentence). Because the corpus contains speaker-dependent variability and such variability is not appropriately reflected in a general purpose speech synthesis system, a method to reduce such variability is proposed. In addition, the entire set of data used in the experiment is provided to the public for future use in comparative research.
Funder
Ministry of Education, Science and Technology
Publisher
Association for Computing Machinery (ACM)