Abstract
Speech processing technology has great potential in the medical field to provide beneficial solutions for both patients and doctors. Speech interfaces, represented by speech synthesis and speech recognition, can be used to transcribe medical documents, control medical devices, correct speech and hearing impairments, and assist the visually impaired. However, it is essential to predict prosody phrase boundaries for accurate natural speech synthesis. This study proposes a method to build a reliable learning corpus to train prosody boundary prediction models based on deep learning. In addition, we offer a way to generate a rule-based model that can predict the prosody boundary from the constructed corpus and use the result to train a deep learning-based model. As a result, we have built a coherent corpus, even though many workers have participated in its development. The estimated pairwise agreement of corpus annotations is between 0.7477 and 0.7916 and kappa coefficient (K) between 0.7057 and 0.7569. In addition, the deep learning-based model based on the rules obtained from the corpus showed a prediction accuracy of 78.57% for the three-level prosody phrase boundary, 87.33% for the two-level prosody phrase boundary.
Funder
Institute for Information and Communications Technology Promotion
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering