The Task of Post-Editing Machine Translation for the Low-Resource Language-Reference-Cited by-同舟云学术

The Task of Post-Editing Machine Translation for the Low-Resource Language

Published:2024-01-05 Issue:2 Volume:14 Page:486
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Rakhimova Diana¹²,Karibayeva Aidana¹²,Turarbek Assem¹

Affiliation:

1. Department of Information Systems, Al-Farabi Kazakh National University, Almaty 050040, Kazakhstan

2. Institute of Information and Computational Technologies, Almaty 050010, Kazakhstan

Abstract

In recent years, machine translation has made significant advancements; however, its effectiveness can vary widely depending on the language pair. Languages with limited resources, such as Kazakh, Uzbek, Kalmyk, Tatar, and others, often encounter challenges in achieving high-quality machine translations. Kazakh is an agglutinative language with complex morphology, making it a low-resource language. This article addresses the task of post-editing machine translation for the Kazakh language. The research begins by discussing the history and evolution of machine translation and how it has developed to meet the unique needs of languages with limited resources. The research resulted in the development of a machine translation post-editing system. The system utilizes modern machine learning methods, starting with neural machine translation using the BRNN model in the initial post-editing stage. Subsequently, the transformer model is applied to further edit the text. Complex structural and grammatical forms are processed, and abbreviations are replaced. Practical experiments were conducted on various texts: news publications, legislative documents, IT sphere, etc. This article serves as a valuable resource for researchers and practitioners in the field of machine translation, shedding light on effective post-editing strategies to enhance translation quality, particularly in scenarios involving languages with limited resources such as Kazakh and Uzbek. The obtained results were tested and evaluated using specialized metrics—BLEU, TER, and WER.

Funder

Ministry of Science and Higher Education of the Republic of Kazakhstan

Publisher

MDPI AG

Link

https://www.mdpi.com/2076-3417/14/2/486/pdf

Reference43 articles.

1. Neural machine translation: Past, present, and future;Mohamed;Neural Comput. Appl.,2021

2. Sequence to Sequence Learning with Neural Networks;Sutskever;Adv. Neural Inf. Process. Syst.,2014

3. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., and Polosukhin, I. (2017). Advances in Neural Information Processing Systems, Curran Associates.

4. Bissembayeva, L. Spiritual unity of the Kazakh and Kyrgyz peoples under colonialism (second half of the 19th century–beginning of the 20th century). Proceedings of the International Scientific-Practical Conference “Academician Council Nurpeys and the History of the Revival of Kazakh Statehood” Held in the Framework of “Nurpeys Studies” on the Occasion of the 85th Anniversary of the Birth of Nurpeys Kenesy Nurpeysuly, Astana, Kazakhstan. (In Kazakh).

5. Makazhanov, A., Myrzakhmetov, B., and Assylbekov, Z. (2018, January 7–12). Manual vs Automatic Bitext Extraction. Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan.