Cascade Speech Translation for the Kazakh Language
-
Published:2023-08-02
Issue:15
Volume:13
Page:8900
-
ISSN:2076-3417
-
Container-title:Applied Sciences
-
language:en
-
Short-container-title:Applied Sciences
Author:
Kozhirbayev Zhanibek1ORCID, Islamgozhayev Talgat1
Affiliation:
1. National Laboratory Astana, Nazarbayev University, Astana 010000, Kazakhstan
Abstract
Speech translation systems have become indispensable in facilitating seamless communication across language barriers. This paper presents a cascade speech translation system tailored specifically for translating speech from the Kazakh language to Russian. The system aims to enable effective cross-lingual communication between Kazakh and Russian speakers, addressing the unique challenges posed by these languages. To develop the cascade speech translation system, we first created a dedicated speech translation dataset ST-kk-ru based on the ISSAI Corpus. The ST-kk-ru dataset comprises a large collection of Kazakh speech recordings along with their corresponding Russian translations. The automatic speech recognition (ASR) module of the system utilizes deep learning techniques to convert spoken Kazakh input into text. The machine translation (MT) module employs state-of-the-art neural machine translation methods, leveraging the parallel Kazakh-Russian translations available in the dataset to generate accurate translations. By conducting extensive experiments and evaluations, we have thoroughly assessed the performance of the cascade speech translation system on the ST-kk-ru dataset. The outcomes of our evaluation highlight the effectiveness of incorporating additional datasets for both the ASR and MT modules. This augmentation leads to a significant improvement in the performance of the cascade speech translation system, increasing the BLEU score by approximately 2 points when translating from Kazakh to Russian. These findings underscore the importance of leveraging supplementary data to enhance the capabilities of speech translation systems.
Funder
Science Committee of the Ministry of Education and Science of the Republic of Kazakhstan
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference40 articles.
1. Chan, W., Jaitly, N., Le, Q., and Vinyals, O. (2016, January 20–25). Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China. 2. Irie, K., Zeyer, A., Schlüter, R., and Ney, H. (2019, January 15–19). Language Modeling with Deep Transformers. Proceedings of the Interspeech, Graz, Austria. 3. Park, D.S., Chan, W., Zhang, Y., Chiu, C.C., Zoph, B., Cubuk, E.D., and Le, Q.V. (2019, January 15–19). SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition. Proceedings of the Interspeech, Graz, Austria. 4. Bahdanau, D., Cho, K.H., and Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. arXiv. 5. Sennrich, R., Haddow, B., and Birch, A. (2016, January 7–12). Neural Machine Translation of Rare Words with Subword Units. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany.
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|