Linguistic knowledge-based vocabularies for Neural Machine Translation-Reference-Cited by-同舟云学术

Linguistic knowledge-based vocabularies for Neural Machine Translation

Published:2020-07-02 Issue:4 Volume:27 Page:485-506
ISSN:1351-3249
Container-title:Natural Language Engineering
language:en
Short-container-title:Nat. Lang. Eng.

Author:

Casas Noe^ORCID,Costa-jussà Marta R.,Fonollosa José A. R.,Alonso Juan A.,Fanlo Ramón

Abstract

AbstractNeural Networks applied to Machine Translation need a finite vocabulary to express textual information as a sequence of discrete tokens. The currently dominant subword vocabularies exploit statistically-discovered common parts of words to achieve the flexibility of character-based vocabularies without delegating the whole learning of word formation to the neural network. However, they trade this for the inability to apply word-level token associations, which limits their use in semantically-rich areas and prevents some transfer learning approaches e.g. cross-lingual pretrained embeddings, and reduces their interpretability. In this work, we propose new hybrid linguistically-grounded vocabulary definition strategies that keep both the advantages of subword vocabularies and the word-level associations, enabling neural networks to profit from the derived benefits. We test the proposed approaches in both morphologically rich and poor languages, showing that, for the former, the quality in the translation of out-of-domain texts is improved with respect to a strong subword baseline.

Publisher

Cambridge University Press (CUP)

Subject

Artificial Intelligence,Linguistics and Language,Language and Linguistics,Software

Reference45 articles.

1. A Call for Clarity in Reporting BLEU Scores

2. Sennrich, R. , Volk, M. and Schneider, G. (2013). Exploiting synergies between open resources for German dependency parsing, POS-tagging, and morphological analysis. In Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013, pp. 601–609.

3. Koehn, P. (2004). Statistical significance tests for machine translation evaluation. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain. Association for Computational Linguistics, pp. 388–395.

4. Callison-Burch, C. , Osborne, M. and Koehn, P. (2006). Re-evaluating the role of Bleu in machine translation research. In 11th Conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy. Association for Computational Linguistics.

5. Linguistic Input Features Improve Neural Machine Translation

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Enhancing Machine Translation Models through Optimal Strategies for Prior Knowledge Integration: A Systematic Review;2024-05-30

2. Multi-granularity Knowledge Sharing in Low-resource Neural Machine Translation;ACM Transactions on Asian and Low-Resource Language Information Processing;2024-02-08

3. Machine translation and its evaluation: a study;Artificial Intelligence Review;2023-02-19