Customized deep learning based Turkish automatic speech recognition system supported by language model-Reference-Cited by-同舟云学术

Customized deep learning based Turkish automatic speech recognition system supported by language model

Published:2024-04-03 Issue: Volume:10 Page:e1981
ISSN:2376-5992
Container-title:PeerJ Computer Science
language:en
Short-container-title:

Author:

Görmez Yasin¹

Affiliation:

1. Management Information System, Sivas Cumhuriyet University, Sivas, Merkez, Turkiye

Abstract

Background In today’s world, numerous applications integral to various facets of daily life include automatic speech recognition methods. Thus, the development of a successful automatic speech recognition system can significantly augment the convenience of people’s daily routines. While many automatic speech recognition systems have been established for widely spoken languages like English, there has been insufficient progress in developing such systems for less common languages such as Turkish. Moreover, due to its agglutinative structure, designing a speech recognition system for Turkish presents greater challenges compared to other language groups. Therefore, our study focused on proposing deep learning models for automatic speech recognition in Turkish, complemented by the integration of a language model. Methods In our study, deep learning models were formulated by incorporating convolutional neural networks, gated recurrent units, long short-term memories, and transformer layers. The Zemberek library was employed to craft the language model to improve system performance. Furthermore, the Bayesian optimization method was applied to fine-tune the hyper-parameters of the deep learning models. To evaluate the model’s performance, standard metrics widely used in automatic speech recognition systems, specifically word error rate and character error rate scores, were employed. Results Upon reviewing the experimental results, it becomes evident that when optimal hyper-parameters are applied to models developed with various layers, the scores are as follows: Without the use of a language model, the Turkish Microphone Speech Corpus dataset yields scores of 22.2 -word error rate and 14.05-character error rate, while the Turkish Speech Corpus dataset results in scores of 11.5 -word error rate and 4.15 character error rate. Upon incorporating the language model, notable improvements were observed. Specifically, for the Turkish Microphone Speech Corpus dataset, the word error rate score decreased to 9.85, and the character error rate score lowered to 5.35. Similarly, the word error rate score improved to 8.4, and the character error rate score decreased to 2.7 for the Turkish Speech Corpus dataset. These results demonstrate that our model outperforms the studies found in the existing literature.

Publisher

PeerJ

Link

https://peerj.com/articles/cs-1981.pdf

Reference45 articles.

1. Convolutional neural networks for speech recognition;Abdel-Hamid;IEEE/ACM Transactions on Audio, Speech, and Language Processing,2014

2. Improving sub-word language modeling for Turkish speech recognition;Akın,2012

3. EMG Sinyallerinin Kısa ZamanlıFourier Dönüşüm Özellikleri Kullanılarak Yapay Sinir Ağlarıile Sınıfland ırılması;Ari;Fırat Üniversitesi Mühendislik Bilimleri Dergisi,2019

4. Automatic speech recognition: a review;Arora;International Journal of Computer Applications,2012

5. A detailed survey of Turkish automatic speech recognition;Arslan;Turkish Journal of Electrical Engineering and Computer Sciences,2020