Non-autoregressive Translation with Layer-Wise Prediction and Deep Supervision-Reference-Cited by-同舟云学术

Non-autoregressive Translation with Layer-Wise Prediction and Deep Supervision

Published:2022-06-28 Issue:10 Volume:36 Page:10776-10784
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Huang Chenyang,Zhou Hao,Zaïane Osmar R.,Mou Lili,Li Lei

Abstract

How do we perform efficient inference while retaining high translation quality? Existing neural machine translation models, such as Transformer, achieve high performance, but they decode words one by one, which is inefficient. Recent non-autoregressive translation models speed up the inference, but their quality is still inferior. In this work, we propose DSLP, a highly efficient and high-performance model for machine translation. The key insight is to train a non-autoregressive Transformer with Deep Supervision and feed additional Layer-wise Predictions. We conducted extensive experiments on four translation tasks (both directions of WMT'14 EN-DE and WMT'16 EN-RO). Results show that our approach consistently improves the BLEU scores compared with respective base models. Specifically, our best variant outperforms the autoregressive model on three translation tasks, while being 14.8 times more efficient in inference.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. NAT4AT: Using Non-Autoregressive Translation Makes Autoregressive Translation Faster and Better;Proceedings of the ACM Web Conference 2024;2024-05-13

2. Alleviating repetitive tokens in non-autoregressive machine translation with unlikelihood training;Soft Computing;2024-01-03

3. Filter-GLAT: Filter Glanced Decoder Output for Non-autoregressive Transformer;Lecture Notes in Computer Science;2024

4. LayerGLAT: A Flexible Non-autoregressive Transformer for Single-Pass and Multi-pass Prediction;Lecture Notes in Computer Science;2024

5. Online cross-layer knowledge distillation on graph neural networks with deep supervision;Neural Computing and Applications;2023-08-08