Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine Translation-Reference-Cited by-同舟云学术

Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine Translation

Published:2020-04-03 Issue:05 Volume:34 Page:7839-7846
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Guo Junliang,Tan Xu,Xu Linli,Qin Tao,Chen Enhong,Liu Tie-Yan

Abstract

Non-autoregressive translation (NAT) models remove the dependence on previous target tokens and generate all target tokens in parallel, resulting in significant inference speedup but at the cost of inferior translation accuracy compared to autoregressive translation (AT) models. Considering that AT models have higher accuracy and are easier to train than NAT models, and both of them share the same model configurations, a natural idea to improve the accuracy of NAT models is to transfer a well-trained AT model to an NAT model through fine-tuning. However, since AT and NAT models differ greatly in training strategy, straightforward fine-tuning does not work well. In this work, we introduce curriculum learning into fine-tuning for NAT. Specifically, we design a curriculum in the fine-tuning process to progressively switch the training from autoregressive generation to non-autoregressive generation. Experiments on four benchmark translation datasets show that the proposed method achieves good improvement (more than 1 BLEU score) over previous NAT baselines in terms of translation accuracy, and greatly speed up (more than 10 times) the inference process over AT baselines.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 14 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Hierarchical Latent Alignment for Non-Autoregressive Generation under High Compression Ratio;IEICE Transactions on Information and Systems;2024-03-01

2. Contrastive Learning with Global Representation for Face Anti-spoofing;Lecture Notes in Computer Science;2024

3. Filter-GLAT: Filter Glanced Decoder Output for Non-autoregressive Transformer;Lecture Notes in Computer Science;2024

4. CMM: Code-Switching with Manifold Mixup for Cross-Lingual Spoken Language Understanding;2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC);2023-10-01

5. NC2T: Novel Curriculum Learning Approaches for Cross-Prompt Trait Scoring;Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval;2023-07-18