Author:
Bang Jeong-Uk,Kim Sang-Hun,Kwon Oh-Wook
Abstract
We propose a method to extend a phoneme set by using a large amount of broadcast data to improve the performance of Korean spontaneous speech recognition. In the proposed method, we first extract variable-length phoneme-level segments from broadcast data and then convert them into fixed-length embedding vectors based on a long short-term memory architecture. We use decision tree-based clustering to find acoustically similar embedding vectors and then build new acoustic subword units by gathering the clustered vectors. To update the lexicon of a speech recognizer, we build a lookup table between the tri-phone units and the units derived from the decision tree. Finally, the proposed lexicon is obtained by updating the original phoneme-based lexicon by referencing the lookup table. To verify the performance of the proposed unit, we compare the proposed unit with the previous units obtained by using the segment-based k-means clustering method or the frame-based decision-tree clustering method. As a result, the proposed unit is shown to produce better performance than the previous units in both spontaneous, and read Korean speech recognition tasks.
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献