Affiliation:
1. Satbayev University, Almaty, Kazakhstan and Narxoz University, Almaty, Kazakhstan
2. Institute of Information and Computational Technologies, Almaty, Kazakhstan
3. Narxoz University, Almaty, Kazakhstan
Abstract
The relevance of the problem of automatic speech recognition lies in the lack of research for low-resource languages, stemming from limited training data and the necessity for new technologies to enhance efficiency and performance. The purpose of this work was to study the main aspects of integrated end-to-end speech recognition and the use of modern technologies in the natural processing of agglutinative languages, including Kazakh. In this article, the study of language models was carried out using comparative, graphic, statistical, and analytical-synthetic methods, which were used in combination. This article addresses automatic speech recognition (ASR) in agglutinative languages, particularly Kazakh, through a unified neural network model that integrates both acoustic and language modeling. Employing advanced techniques like connectionist temporal classification and attention mechanisms, the study focuses on effective speech-to-text transcription for languages with complex morphologies. Transfer learning from high-resource languages helps mitigate data scarcity in languages such as Kazakh, Kyrgyz, Uzbek, Turkish, and Azerbaijani. The research assesses model performance, underscores ASR challenges, and proposes advancements for these languages. It includes a comparative analysis of phonetic and word-formation features in agglutinative Turkic languages, using statistical data. The findings aid further research in linguistics and technology for enhancing speech recognition and synthesis, contributing to voice identification and automation processes.
Publisher
Association for Computing Machinery (ACM)