Affiliation:
1. University of Málaga, Department of Translation and Interpreting, Campus de Teatinos s/n , 29071 - Malaga , Spain
Abstract
Abstract
The present research introduces the tool gApp, a Python-based text preprocessing system for the automatic identification and conversion of discontinuous multiword expressions (MWEs) into their continuous form in order to enhance neural machine translation (NMT). To this end, an experiment with semi-fixed verb–noun idiomatic combinations (VNICs) will be carried out in order to evaluate to what extent gApp can optimise the performance of the two main free open-source NMT systems —Google Translate and DeepL— under the challenge of MWE discontinuity in the Spanish into English directionality. In the light of our promising results, the study concludes with suggestions on how to further optimise MWE-aware NMT systems.
Subject
Linguistics and Language,Communication,Language and Linguistics
Reference36 articles.
1. Alegria, Iñaki, Olatz Ansa, Xabier Artola, Nerea Ezeiza, Koldo Gojenola & Ruben Urizar. 2004. Representation and treatment of multiword expressions in Basque. In Proceedings of the second ACL workshop on multiword expressions: Integrating processing, 48–55. https://www.aclweb.org/anthology/W04-0407.pdf
2. Al Saied, Hazem, Mathieu Constant & Marie Candito. 2017. The ATILF-LLF system for Parseme shared task: a transition-based verbal multiword expression tagger. In Proceedings of the 13th workshop on multiword expressions (MWE 2017), 127–132. https://www.aclweb.org/anthology/W17-1717.pdf
3. Al Saied, Hazem, Marie Candito & Mathieu Constant. 2019. Comparing linear and neural models for competitive MWE identification. In Proceedings of the 22nd Nordic Conference on Computational Linguistics, 86–96. https://www.aclweb.org/anthology/W19-6109.pdf
4. Anastasopoulos, Antonios. 2019. An analysis of source-side grammatical errors in NMT. In Proceedings of the 2019 ACL workshop BlackboxNLP: Analyzing and interpreting neural networks for NLP, 213–223. https://www.aclweb.org/anthology/W19-4822
5. Bejček, Eduard, Pavel Straňák & Daniel Zeman. 2011. Influence of treebank design on representation of multiword expressions. In Alexander F. Gelbukh (ed.), Computational Linguistics and intelligent text processing – 12th international conference, CICLing 2011, vol. 6608 (Lecture notes in Computer Science), 1–14. Berlin & Heidelberg: Springer.
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献