Affiliation:
1. ADIYAMAN ÜNİVERSİTESİ
2. GAZI UNIVERSITY, FACULTY OF TECHNOLOGY
Abstract
The current Automatic Speech Recognition (ASR) modeling strategy still suffers from huge performance degradation when faced with languages with limited resources such as Turkish. Especially when the Language Model (LM) does not support the Acoustic Model (AM) sufficiently, the Word Error Rate (WER) increases. Therefore, a robust LM makes a strong contribution to improving ASR performance by generating word relations from the existing corpus. However, developing a robust language model is a challenging task due to the agglutinative nature of Turkish. Therefore, within the scope of the study, a sentence-level LM optimization method is proposed to improve the WER performance of Turkish ASR. In the proposed method, instead of a fixed word sequence obtained from the Markov assumptions, the probability of the word sequence forming a sentence was calculated. A method with n-gram and skip-gram properties is presented to obtain the word sequence probability. The proposed method has been tested on both statistical and Artificial Neural Network (ANN) based LMs. In the experiments carried out using, not only words but also sub-word level, two Turkish corpora (METU and Bogazici) shared via Linguistic Data Consortium (LDC) and a separate corpus, which we separate corpus that we specially created as HS was used. According to the experimental results obtained from statistical-based LM, 0.5% WER increases for the METU corpus, 1.6% WER decreases for the Bogazici corpus, and a 2.5% WER decrease for the HS corpus were observed. In the Feedforward Neural Networks (FNN) based LM, WER decreases were observed 0.2% for the METU corpus, 0.8% for the Bogazici corpus, and 1.6% for the HS corpus. Also, in the Recurrent Neural Network (RNN)-Long Short Term Memory (LSTM) based LM, WER decreases were observed 0.6% for METU corpus, 1.1% for the Bogazici corpus and 1.5% for the HS corpus. As a result, when the proposed method was applied to the LMs required for ASR, WER decreased, and the total performance of ASR increased.
Subject
Colloid and Surface Chemistry,Physical and Theoretical Chemistry
Reference66 articles.
1. [1] Hamdan P., Ridi F., Rudy H., “Indonesian automatic speech recognition system using CMUSphinx toolkit and limited dataset”, International Symposium on Electronics and Smart Devices, 283-286 (2017).
2. [2] Kelebekler E., İnal M., “Otomobil içindeki cihazların sesle kontrolüne yönelik konuşma tanıma sisteminin gerçek zamanlı laboratuar uygulaması”, Politeknik Dergisi, 2: 109-114, (2008).
3. [3] Avuçlu E., Özçiftçi A., Elen A., “An application to control media player with voice commands”, Politeknik Dergisi, 23(4): 1311-1315, (2020).
4. [4] Burunkaya M. ve Dijle M., “Yerleşik ve gömülü uygulamalarda kontrol işlemleri ve pc’de yazı yazmak için kullanabilen düşük maliyetli genel amaçlı bir konuşma tanılama sistemi”, Politeknik Dergisi,
21(2): 477-488, (2018).
5. [5] Yajie, M., “Kaldi+PDNN: building DNN-based ASR systems with kaldi and PDNN”, arXiv:1401.6984 (2014).
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献