Hybrid Pipeline for Building Arabic Tunisian Dialect-standard Arabic Neural Machine Translation Model from Scratch-Reference-Cited by-同舟云学术

Hybrid Pipeline for Building Arabic Tunisian Dialect-standard Arabic Neural Machine Translation Model from Scratch

Published:2023-03-31 Issue:3 Volume:22 Page:1-21
ISSN:2375-4699
Container-title:ACM Transactions on Asian and Low-Resource Language Information Processing
language:en
Short-container-title:ACM Trans. Asian Low-Resour. Lang. Inf. Process.

Author:

Kchaou Saméh¹^ORCID,Boujelbane Rahma¹^ORCID,Hadrich Lamia¹^ORCID

Affiliation:

1. University of Sfax, Tunisia

Abstract

Deep Learning is one of the most promising technologies compared to other methods in the context of machine translation. It has been proven to achieve impressive results on large amounts of parallel data for well-endowed languages. Nevertheless, for low-resource languages such as the Arabic Dialects, Deep Learning models failed due to the lack of available parallel corpora. In this article, we present a method to create a parallel corpus to build an effective NMT model able to translate into MSA, Tunisian Dialect texts present in social networks. For this, we propose a set of data augmentation methods aiming to increase the size of the state-of-the-art parallel corpus. By evaluating the impact of this step, we noticed that it has effectively boosted both the size and the quality of the corpus. Then, using the resulted corpus, we compare the effectiveness of CNN, RNN and transformers models to translate Tunisian Dialect into MSA. Experiments show that a better translation is achieved by the transformer model with a BLEU score of 60 vs., respectively, 33.36 and 53.98 with RNN and CNN models.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3568674

Reference34 articles.

1. Neural Machine Translation from Jordanian Dialect to Modern Standard Arabic

2. Alina Karakanta, Jon Dehdari, and Josef van Genabith. 2018. Neural machine translation for low-resource languages without parallel corpora. Mach. Translat. 32 (2018),167–189.

3. Ebtesam H. Almansor and Ahmed Al-Ani. 2018. A hybrid neural machine translation technique for translating low resource languages. In Machine Learning and Data Mining in Pattern Recognition. Springer International Publishing, Cham, 347–356.

4. Translating Dialectal Arabic as Low Resource Language using Word Embedding

5. Dzmitry Bahdanau Kyunghyun Cho and Yoshua Bengio. 2016. Neural Machine Translation by Jointly Learning to Align and Translate. arxiv:1409.0473 [cs.CL].

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Survey on Machine Translation of Low-Resource Arabic Dialects;2024 15th International Conference on Information and Communication Systems (ICICS);2024-08-13

2. Crossing Linguistic Barriers: A Hybrid Attention Framework for Chinese-Arabic Machine Translation;2024 International Conference on Artificial Intelligence, Computer, Data Sciences and Applications (ACDSA);2024-02-01

3. Challenges and Progress in Constructing Arabic Dialect Corpora and Linguistic tools: A Focus on Moroccan and Tunisian Dialects;2023 7th IEEE Congress on Information Science and Technology (CiSt);2023-12-16