Türkçe Otomatik Konuşma Tanıma Sistemi için Dil Modeli Optimizasyon Yöntemi

Author:

OYUCU Saadin1,POLAT Hüseyin2

Affiliation:

1. ADIYAMAN ÜNİVERSİTESİ

2. GAZI UNIVERSITY, FACULTY OF TECHNOLOGY

Abstract

The current Automatic Speech Recognition (ASR) modeling strategy still suffers from huge performance degradation when faced with languages with limited resources such as Turkish. Especially when the Language Model (LM) does not support the Acoustic Model (AM) sufficiently, the Word Error Rate (WER) increases. Therefore, a robust LM makes a strong contribution to improving ASR performance by generating word relations from the existing corpus. However, developing a robust language model is a challenging task due to the agglutinative nature of Turkish. Therefore, within the scope of the study, a sentence-level LM optimization method is proposed to improve the WER performance of Turkish ASR. In the proposed method, instead of a fixed word sequence obtained from the Markov assumptions, the probability of the word sequence forming a sentence was calculated. A method with n-gram and skip-gram properties is presented to obtain the word sequence probability. The proposed method has been tested on both statistical and Artificial Neural Network (ANN) based LMs. In the experiments carried out using, not only words but also sub-word level, two Turkish corpora (METU and Bogazici) shared via Linguistic Data Consortium (LDC) and a separate corpus, which we separate corpus that we specially created as HS was used. According to the experimental results obtained from statistical-based LM, 0.5% WER increases for the METU corpus, 1.6% WER decreases for the Bogazici corpus, and a 2.5% WER decrease for the HS corpus were observed. In the Feedforward Neural Networks (FNN) based LM, WER decreases were observed 0.2% for the METU corpus, 0.8% for the Bogazici corpus, and 1.6% for the HS corpus. Also, in the Recurrent Neural Network (RNN)-Long Short Term Memory (LSTM) based LM, WER decreases were observed 0.6% for METU corpus, 1.1% for the Bogazici corpus and 1.5% for the HS corpus. As a result, when the proposed method was applied to the LMs required for ASR, WER decreased, and the total performance of ASR increased.

Publisher

Politeknik Dergisi

Subject

Colloid and Surface Chemistry,Physical and Theoretical Chemistry

Reference66 articles.

1. [1] Hamdan P., Ridi F., Rudy H., “Indonesian automatic speech recognition system using CMUSphinx toolkit and limited dataset”, International Symposium on Electronics and Smart Devices, 283-286 (2017).

2. [2] Kelebekler E., İnal M., “Otomobil içindeki cihazların sesle kontrolüne yönelik konuşma tanıma sisteminin gerçek zamanlı laboratuar uygulaması”, Politeknik Dergisi, 2: 109-114, (2008).

3. [3] Avuçlu E., Özçiftçi A., Elen A., “An application to control media player with voice commands”, Politeknik Dergisi, 23(4): 1311-1315, (2020).

4. [4] Burunkaya M. ve Dijle M., “Yerleşik ve gömülü uygulamalarda kontrol işlemleri ve pc’de yazı yazmak için kullanabilen düşük maliyetli genel amaçlı bir konuşma tanılama sistemi”, Politeknik Dergisi, 21(2): 477-488, (2018).

5. [5] Yajie, M., “Kaldi+PDNN: building DNN-based ASR systems with kaldi and PDNN”, arXiv:1401.6984 (2014).

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3