Author:
Hadiwinoto P N,Lestari D P
Abstract
Abstract
Language model plays an important role in decoding process of the automatic speech recognition. The accuracy of spontaneous speech recognition is still very low compared to dictated speech of the Indonesian automatic speech recognition. It is due to the lack of the number of spontaneous data. Collecting spontaneous data is also difficult to do, so one of the candidate solutions is to augment data from existing spontaneous data. In this research, experiments are conducted on language models to improve the accuracy of spontaneous Indonesian speech recognition by conducting data augmentation. Data augmentation in this research is done by using statistical machine translation named ‘Moses’. Language modeling technique used here is n-gram. GMM-HMM is used for acoustic modeling. First, spontaneous text corpus is added to the text corpus, then the data augmentation is conducted. When the language model is formed from the addition of a spontaneous text corpus, there is an increase in accuracy of 3.59% relative to the baseline. When data augmentation is done on language model there is an increase in accuracy of 2.74% relative to the baseline. However, this decrease is considered not significant compared to the effort required in collecting spontaneous data manually.