Affiliation:
1. Institute of Acoustics, Chinese Academy of Sciences, No. 21 North 4th Ring Road, Haidian District, 100190 Beijing, China
Abstract
Malay and Indonesian language are both syllable-friendly languages. It is convenient to covert Malay and Indonesian words to syllables and the syllable-based language model (LM) can be applied into actual speech recognition systems. This paper proposed a method and the algorithm for the syllabification of Malay and Indonesian words. In order to evaluate the perplexity (PPL) of syllable-based LM, this paper used SRILM to compare the PPL between word-based, syllable-based, phoneme-based and character-based [Formula: see text]-gram LMs. The experimental results show that the perplexity of the syllable-based LM is better than the word-based model, while the perplexity of the character-based or phoneme-based LM is better than the syllable-based LM. However, when the syllable-based LM is applied into actual speech recognition systems, it has the advantages of lower cost, similar performance and easier deployment, compared with the speech recognition systems using deep learning models.
Funder
Ministry of Science and Technology of the People's Republic of China
Publisher
World Scientific Pub Co Pte Ltd
Subject
General Earth and Planetary Sciences,General Engineering,General Environmental Science