Improved feature decay algorithms for statistical machine translation-Reference-Cited by-同舟云学术

Improved feature decay algorithms for statistical machine translation

Published:2020-09-22 Issue:1 Volume:28 Page:71-91
ISSN:1351-3249
Container-title:Natural Language Engineering
language:en
Short-container-title:Nat. Lang. Eng.

Author:

Poncelas Alberto^ORCID,Maillette de Buy Wenniger Gideon^ORCID,Way Andy^ORCID

Abstract

AbstractIn machine-learning applications, data selection is of crucial importance if good runtime performance is to be achieved. In a scenario where the test set is accessible when the model is being built, training instances can be selected so they are the most relevant for the test set. Feature Decay Algorithms (FDA) are a technique for data selection that has demonstrated excellent performance in a number of tasks. This method maximizes the diversity of the n-grams in the training set by devaluing those ones that have already been included. We focus on this method to undertake deeper research on how to select better training data instances. We give an overview of FDA and propose improvements in terms of speed and quality. Using German-to-English parallel data, first we create a novel approach that decreases the execution time of FDA when multiple computation units are available. In addition, we obtain improvements on translation quality by extending FDA using information from the parallel corpus that is generally ignored.

Publisher

Cambridge University Press (CUP)

Subject

Artificial Intelligence,Linguistics and Language,Language and Linguistics,Software

Reference58 articles.

1. Zens, R. , Stanton, D. and Xu, P. (2012). A systematic comparison of phrase table pruning techniques. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, Korea. Association for Computational Linguistics, pp. 972–983.

2. Snover, M. , Dorr, B. , Schwartz, R. , Micciulla, L. and Makhoul, J. (2006). A study of translation edit rate with targeted human annotation. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas, Cambridge, Massachusetts, USA. Association for Machine Translation in the Americas, pp. 223–231.

3. Selecting Artificially-Generated Sentences for Fine-Tuning Neural Machine Translation

4. Active learning for statistical phrase-based machine translation

5. Gascó, G. , Rocha, M.-A. , Sanchis-Trilles, G. , Andrés-Ferrer, J. and Casacuberta, F. (2012). Does more data always yield better translations? In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, France. Association for Computational Linguistics, pp. 152–161.

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Cellular automata-based MapReduce design: Migrating a big data processing model from Industry 4.0 to Industry 5.0;e-Prime - Advances in Electrical Engineering, Electronics and Energy;2024-06

2. Research on the Optimal Selection Method of Fuzzy Semantics in English Long Sentence Machine Translation;2022 6th Asian Conference on Artificial Intelligence Technology (ACAIT);2022-12-09

3. English-Chinese Machine Translation Based on Transfer Learning and Chinese-English Corpus;Computational Intelligence and Neuroscience;2022-09-27

4. Design of English Translation Mobile Information System Based on Recurrent Neural Network;Mobile Information Systems;2022-08-10