Neural machine translation of low-resource languages using SMT phrase pair injection-Reference-Cited by-同舟云学术

Neural machine translation of low-resource languages using SMT phrase pair injection

Published:2020-06-17 Issue:3 Volume:27 Page:271-292
ISSN:1351-3249
Container-title:Natural Language Engineering
language:en
Short-container-title:Nat. Lang. Eng.

Author:

Sen Sukanta,Hasanuzzaman Mohammed,Ekbal Asif,Bhattacharyya Pushpak,Way Andy

Abstract

AbstractNeural machine translation (NMT) has recently shown promising results on publicly available benchmark datasets and is being rapidly adopted in various production systems. However, it requires high-quality large-scale parallel corpus, and it is not always possible to have sufficiently large corpus as it requires time, money, and professionals. Hence, many existing large-scale parallel corpus are limited to the specific languages and domains. In this paper, we propose an effective approach to improve an NMT system in low-resource scenario without using any additional data. Our approach aims at augmenting the original training data by means of parallel phrases extracted from the original training data itself using a statistical machine translation (SMT) system. Our proposed approach is based on the gated recurrent unit (GRU) and transformer networks. We choose the Hindi–English, Hindi–Bengali datasets for Health, Tourism, and Judicial (only for Hindi–English) domains. We train our NMT models for 10 translation directions, each using only 5–23k parallel sentences. Experiments show the improvements in the range of 1.38–15.36 BiLingual Evaluation Understudy points over the baseline systems. Experiments show that transformer models perform better than GRU models in low-resource scenarios. In addition to that, we also find that our proposed method outperforms SMT—which is known to work better than the neural models in low-resource scenarios—for some translation directions. In order to further show the effectiveness of our proposed model, we also employ our approach to another interesting NMT task, for example, old-to-modern English translation, using a tiny parallel corpus of only 2.7K sentences. For this task, we use publicly available old-modern English text which is approximately 1000 years old. Evaluation for this task shows significant improvement over the baseline NMT.

Publisher

Cambridge University Press (CUP)

Subject

Artificial Intelligence,Linguistics and Language,Language and Linguistics,Software

Reference49 articles.

1. Papineni, K. , Roukos, S. , Ward, T. and Zhu, W.-J. (2002). BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Philadelphia, Pennsylvania, pp. 311–318.

2. Wang, X. , Lu, Z. , Tu, Z. , Li, H. , Xiong, D. and Zhang, M. (2017). Neural machine translation advised by statistical machine translation. In Thirty-First AAAI Conference on Artificial Intelligence.

3. Sen, S. , Hasanuzzaman, M. , Ekbal, A. , Bhattacharyya, P. and Way, A. (in press). Take help from elder brother: old to modern english nmt with phrase pair feedback. In Proceedings of the International Conference on Computational Linguistics and Intelligent Text Processing.

4. Sutskever, I. , Vinyals, O. and Le, Q.V. (2014). Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems, pp. 3104–3112.

5. Koehn, P. (2004). Statistical significance tests for machine translation evaluation. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing.

Cited by 17 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Extensive examination of hyper-parameters setting using neural-based methods for limited resources language: Nyishi-English;International Journal of Information Technology;2024-06-14

2. Addressing data scarcity issue for English–Mizo neural machine translation using data augmentation and language model;Journal of Intelligent & Fuzzy Systems;2024-03-05

3. Adopting machine translation in the healthcare sector: A methodological multi-criteria review;Computer Speech & Language;2024-03

4. Neural machine translation for limited resources English-Nyishi pair;Sādhanā;2023-11-02

5. Machine Translation for Historical Research: A case study of Aramaic-Ancient Hebrew Translations;Journal on Computing and Cultural Heritage;2023-10-16