Integrated End-to-End Automatic Speech Recognition for Languages for Agglutinative Languages-Reference-Cited by-同舟云学术

Integrated End-to-End Automatic Speech Recognition for Languages for Agglutinative Languages

Published:2024-06-21 Issue:6 Volume:23 Page:1-17
ISSN:2375-4699
Container-title:ACM Transactions on Asian and Low-Resource Language Information Processing
language:en
Short-container-title:ACM Trans. Asian Low-Resour. Lang. Inf. Process.

Author:

Bekarystankyzy Akbayan¹^ORCID,Mamyrbayev Orken²^ORCID,Anarbekova Tolganay³^ORCID

Affiliation:

1. Satbayev University, Almaty, Kazakhstan and Narxoz University, Almaty, Kazakhstan

2. Institute of Information and Computational Technologies, Almaty, Kazakhstan

3. Narxoz University, Almaty, Kazakhstan

Abstract

The relevance of the problem of automatic speech recognition lies in the lack of research for low-resource languages, stemming from limited training data and the necessity for new technologies to enhance efficiency and performance. The purpose of this work was to study the main aspects of integrated end-to-end speech recognition and the use of modern technologies in the natural processing of agglutinative languages, including Kazakh. In this article, the study of language models was carried out using comparative, graphic, statistical, and analytical-synthetic methods, which were used in combination. This article addresses automatic speech recognition (ASR) in agglutinative languages, particularly Kazakh, through a unified neural network model that integrates both acoustic and language modeling. Employing advanced techniques like connectionist temporal classification and attention mechanisms, the study focuses on effective speech-to-text transcription for languages with complex morphologies. Transfer learning from high-resource languages helps mitigate data scarcity in languages such as Kazakh, Kyrgyz, Uzbek, Turkish, and Azerbaijani. The research assesses model performance, underscores ASR challenges, and proposes advancements for these languages. It includes a comparative analysis of phonetic and word-formation features in agglutinative Turkic languages, using statistical data. The findings aid further research in linguistics and technology for enhancing speech recognition and synthesis, contributing to voice identification and automation processes.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3663568

Reference56 articles.

1. СИСТЕМА КРИТЕРІЇВ ОЦІНЮВАННЯ ЕФЕКТИВНОСТІ ПРОЄКТІВ ГАЛУЗІ ІНФОРМАЦІЙНИХ ТЕХНОЛОГІЙ

2. Concept of training by communicative method;Duysenova Marzhan;Life Science Journal,2013

3. Noisy Student Teacher Training with Self Supervised Learning for Children ASR

4. Professional foreign-language education: Goal-setting as basic component of linguo-didactic competence;Kunanbayeva Salima;Espacios,2017

5. Gulmira Bekmanova Banu Yergesh Altynbek Sharipbay Assel Omarbekova and Alma Zakirova. 2022. Linguistic Foundations of Low-Resource Languages for Speech Synthesis on the Example of the Kazakh Language. Springer Cham.