Abstract
Despite the many proposals to solve the neural machine translation (NMT) problem of low-resource languages, it continues to be difficult. The issue becomes even more complicated when few resources cover only a single domain. In this paper, we discuss the applicability of a source-side monolingual dataset of low-resource languages to improve the NMT system for such languages. In our experiments, we used Wolaytta–English translation as a low-resource language. We discuss the use of self-learning and fine-tuning approaches to improve the NMT system for Wolaytta–English translation using both authentic and synthetic datasets. The self-learning approach showed +2.7 and +2.4 BLEU score improvements for Wolaytta–English and English–Wolaytta translations, respectively, over the best-performing baseline model. Further fine-tuning the best-performing self-learning model showed +1.2 and +0.6 BLEU score improvements for Wolaytta–English and English–Wolaytta translations, respectively. We reflect on our contributions and plan for the future of this difficult field of study.
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference42 articles.
1. Information retrieval system and machine translation: A review;Madankar;Procedia Comput. Sci.,2016
2. Kenny, D. (2018). The Routledge Handbook of Translation and Philosophy, Routledge.
3. Bahdanau, D., Cho, K.H., and Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv.
4. Making sense of neural machine translation;Forcada;Transl. Spaces,2017
5. Nekoto, W., Marivate, V., Matsila, T., Fasubaa, T., Kolawole, T., Fagbohungbe, T., Akinola, S.O., Muhammad, S.H., Kabongo, S., and Osei, S. (2020). Participatory research for low-resourced machine translation: A case study in african languages. arXiv.
Cited by
9 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Preserving Sasak Dialectal Features in English to Sasak Machine Translation through Locked Tokenization with Transformer Models;2024 International Seminar on Intelligent Technology and Its Applications (ISITIA);2024-07-10
2. Reframing social media discourse: Converting hate speech to non-hate speech;Journal of Intelligent & Fuzzy Systems;2024-04-28
3. Exploring user perspectives;FORUM. Revue internationale d’interprétation et de traduction / International Journal of Interpretation and Translation;2024-04-25
4. Automatic Translation between Mixtec to Spanish Languages Using Neural Networks;Applied Sciences;2024-03-31
5. Research on Tibetan-Chinese Neural Machine Translation Based on GRU;2023 3rd International Conference on Digital Society and Intelligent Systems (DSInS);2023-11-10