Knowledge Distillation: A Method for Making Neural Machine Translation More Efficient-Reference-Cited by-同舟云学术

Knowledge Distillation: A Method for Making Neural Machine Translation More Efficient

Published:2022-02-14 Issue:2 Volume:13 Page:88
ISSN:2078-2489
Container-title:Information
language:en
Short-container-title:Information

Author:

Jooste Wandri,Haque Rejwanul^ORCID,Way Andy^ORCID

Abstract

Neural machine translation (NMT) systems have greatly improved the quality available from machine translation (MT) compared to statistical machine translation (SMT) systems. However, these state-of-the-art NMT models need much more computing power and data than SMT models, a requirement that is unsustainable in the long run and of very limited benefit in low-resource scenarios. To some extent, model compression—more specifically state-of-the-art knowledge distillation techniques—can remedy this. In this work, we investigate knowledge distillation on a simulated low-resource German-to-English translation task. We show that sequence-level knowledge distillation can be used to train small student models on knowledge distilled from large teacher models. Part of this work examines the influence of hyperparameter tuning on model performance when lowering the number of Transformer heads or limiting the vocabulary size. Interestingly, the accuracy of these student models is higher than that of the teachers in some cases even though the student model training times are shorter in some cases. In a novel contribution, we demonstrate for a specific MT service provider that in the post-deployment phase, distilled student models can reduce emissions, as well as cost purely in monetary terms, by almost 50%.

Funder

Science Foundation Ireland

Marie Skłodowska-Curie

Publisher

MDPI AG

Subject

Information Systems

Link

https://www.mdpi.com/2078-2489/13/2/88/pdf

Reference28 articles.

1. Deep learning

2. Curb Your Carbon Emissions: Benchmarking Carbon Emissions in Machine Translation;Yusuf;arXiv,2021

3. Deep Learning’s Diminishing Returnshttps://spectrum.ieee.org/deep-learning-computational-cost

Cited by 11 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Measuring the Effectiveness of Carbon-Aware AI Training Strategies in Cloud Instances: A Confirmation Study;Future Internet;2024-09-13

2. Research on English Chinese Neural Machine Translation based on Improved Deep Q-Network Approach;2024 Second International Conference on Data Science and Information System (ICDSIS);2024-05-17

3. Proposal for a Triple Bottom Line for Translation Automation and Sustainability;The Journal of Specialised Translation;2024-01-30

4. Continual Domain Adaption for Neural Machine Translation;Communications in Computer and Information Science;2023-11-27

5. Incorporating Collaborative and Active Learning Strategies in the Design and Deployment of a Master Course on Computer-Assisted Scientific Translation;Technology, Knowledge and Learning;2023-08-07