Extremely low-resource neural machine translation for Asian languages-Reference-Cited by-同舟云学术

Extremely low-resource neural machine translation for Asian languages

Published:2020-12 Issue:4 Volume:34 Page:347-382
ISSN:0922-6567
Container-title:Machine Translation
language:en
Short-container-title:Machine Translation

Author:

Rubino Raphael^ORCID,Marie Benjamin,Dabre Raj,Fujita Atushi,Utiyama Masao,Sumita Eiichiro

Abstract

AbstractThis paper presents a set of effective approaches to handle extremely low-resource language pairs for self-attention based neural machine translation (NMT) focusing on English and four Asian languages. Starting from an initial set of parallel sentences used to train bilingual baseline models, we introduce additional monolingual corpora and data processing techniques to improve translation quality. We describe a series of best practices and empirically validate the methods through an evaluation conducted on eight translation directions, based on state-of-the-art NMT approaches such as hyper-parameter search, data augmentation with forward and backward translation in combination with tags and noise, as well as joint multilingual training. Experiments show that the commonly used default architecture of self-attention NMT models does not reach the best results, validating previous work on the importance of hyper-parameter tuning. Additionally, empirical results indicate the amount of synthetic data required to efficiently increase the parameters of the models leading to the best translation quality measured by automatic metrics. We show that the best NMT models trained on large amount of tagged back-translations outperform three other synthetic data generation approaches. Finally, comparison with statistical machine translation (SMT) indicates that extremely low-resource NMT requires a large amount of synthetic parallel data obtained with back-translation in order to close the performance gap with the preceding SMT approach.

Publisher

Springer Science and Business Media LLC

Subject

Artificial Intelligence,Linguistics and Language,Language and Linguistics,Software

Link

http://link.springer.com/content/pdf/10.1007/s10590-020-09258-6.pdf

Reference59 articles.

1. Aharoni R, Johnson M, Firat O (2019) Massively multilingual neural machine translation. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp 3874–3884. Association for Computational Linguistics, Minneapolis, USA. https://doi.org/10.18653/v1/N19-1388. https://aclweb.org/anthology/N19-1388

2. Artetxe M, Labaka G, Agirre E (2018) Unsupervised statistical machine translation. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp 3632–3642. Association for Computational Linguistics, Brussels, Belgium. https://doi.org/10.18653/v1/D18-1399. https://aclweb.org/anthology/D18-1399

3. Artetxe M, Labaka G, Agirre E, Cho K (2018) Unsupervised neural machine translation. In: Proceedings of the 6th international conference on learning representations. Vancouver, Canada. https://openreview.net/forum?id=Sy2ogebAW

4. Ba JL, Kiros JR, Hinton GE (2016) Layer normalization. arXiv:1607.06450

5. Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: Proceedings of the 3rd international conference on learning representations. San Diego, USA. arxiv:1409.0473

Cited by 14 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Exploring Low-Resource Machine Translation: Case Study of Lao-Vietnamese Translation;2024 International Conference on Multimedia Analysis and Pattern Recognition (MAPR);2024-08-15

2. Automatic Translation between Mixtec to Spanish Languages Using Neural Networks;Applied Sciences;2024-03-31

3. The Task of Post-Editing Machine Translation for the Low-Resource Language;Applied Sciences;2024-01-05

4. Reliability of electric vehicle charging infrastructure: A cross-lingual deep learning approach;Communications in Transportation Research;2023-12

5. English-Afaan Oromo Machine Translation Using Deep Attention Neural Network;Optical Memory and Neural Networks;2023-09