Affiliation:
1. Faculty of Information Engineering and Automation, Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming, Yunnan, China
Abstract
In Neural Machine Translation (NMT), due to the limitations of the vocabulary, unknown words cannot be translated properly, which brings suboptimal performance of the translation system. For resource-scarce NMT that have small-scale training corpus, the effect is amplified. The traditional approach of amplifying the scale of the corpus is not applicable, because the parallel corpus is difficult to obtain in a resource-scarce setting; however, it is easy to obtain and utilize external knowledge, bilingual lexicon, and other resources. Therefore, we propose classification lexicon approach for processing unknown words in the Chinese-Vietnamese NMT task. Specifically, three types of unknown Chinese-Vietnamese words are classified and their corresponding classification lexicon are constructed by word alignment, Wikipedia extraction, and rule-based methods, respectively. After translation, the unknown words are restored by lexicon for post-processing. Experiment results on Chinese-Vietnamese, English-Vietnamese, and Mongolian-Chinese translations show that our approach significantly improves the accuracy and the performance of NMT especially in a resource-scarce setting.
Funder
National key research and development plan project
Natural Science Foundation of Yunnan Province
National Natural Science Foundation of China
Yunnan hightech industry development project
Publisher
Association for Computing Machinery (ACM)
Reference27 articles.
1. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
2. Y. Tang F. Meng Z. Lu etal 2016. Neural machine translation with external phrase memory. arXiv eprint arXiv:1606.01792. Y. Tang F. Meng Z. Lu et al. 2016. Neural machine translation with external phrase memory. arXiv eprint arXiv:1606.01792.
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献