Towards Making the Most of BERT in Neural Machine Translation-Reference-Cited by-同舟云学术

Towards Making the Most of BERT in Neural Machine Translation

Published:2020-04-03 Issue:05 Volume:34 Page:9378-9385
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Yang Jiacheng,Wang Mingxuan,Zhou Hao,Zhao Chengqi,Zhang Weinan,Yu Yong,Li Lei

Abstract

GPT-2 and BERT demonstrate the effectiveness of using pre-trained language models (LMs) on various natural language processing tasks. However, LM fine-tuning often suffers from catastrophic forgetting when applied to resource-rich tasks. In this work, we introduce a concerted training framework (CTnmt) that is the key to integrate the pre-trained LMs to neural machine translation (NMT). Our proposed CTnmt} consists of three techniques: a) asymptotic distillation to ensure that the NMT model can retain the previous pre-trained knowledge; b) a dynamic switching gate to avoid catastrophic forgetting of pre-trained knowledge; and c) a strategy to adjust the learning paces according to a scheduled policy. Our experiments in machine translation show CTnmt gains of up to 3 BLEU score on the WMT14 English-German language pair which even surpasses the previous state-of-the-art pre-training aided NMT by 1.4 BLEU score. While for the large WMT14 English-French task with 40 millions of sentence-pairs, our base model still significantly improves upon the state-of-the-art Transformer big model by more than 1 BLEU score.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 31 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Optimizing Machine Translation Algorithms through Empirical Study of Multi-modal Information Fusion;2024 2nd International Conference on Sustainable Computing and Smart Systems (ICSCSS);2024-07-10

2. Curriculum pre-training for stylized neural machine translation;Applied Intelligence;2024-06-18

3. Promises and Perils of Generative AI in the Healthcare Sector;Advances in Medical Technologies and Clinical Practice;2024-06-14

4. Promise and Challenges of Generative AI in Healthcare Information Systems;Proceedings of the 2024 ACM Southeast Conference on ZZZ;2024-04-18

5. Online English Machine Translation Algorithm Based on Large Language Model;2024 3rd International Conference on Sentiment Analysis and Deep Learning (ICSADL);2024-03-13