Training Tips for the Transformer Model-Reference-Cited by-同舟云学术

Training Tips for the Transformer Model

Published:2018-04-01 Issue:1 Volume:110 Page:43-70
ISSN:1804-0462
Container-title:The Prague Bulletin of Mathematical Linguistics
language:
Short-container-title:

Author:

Popel Martin,Bojar Ondřej

Abstract

Abstract This article describes our experiments in neural machine translation using the recent Tensor2Tensor framework and the Transformer sequence-to-sequence model (Vaswani et al., 2017). We examine some of the critical parameters that affect the final translation quality, memory usage, training stability and training time, concluding each experiment with a set of recommendations for fellow researchers. In addition to confirming the general mantra “more data and larger models”, we address scaling to multiple GPUs and provide practical tips for improved training regarding batch size, learning rate, warmup steps, maximum sentence length and checkpoint averaging. We hope that our observations will allow others to get better results given their particular hardware and data constraints.

Publisher

Charles University in Prague, Karolinum Press

Subject

General Engineering

Link

http://www.degruyter.com/view/j/pralin.2018.110.issue-1/pralin-2018-0002/pralin-2018-0002.pdf

Cited by 88 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A neural network transformer model for composite microstructure homogenization;Engineering Applications of Artificial Intelligence;2024-08

2. Parameter-efficient fine-tuning of pre-trained code models for just-in-time defect prediction;Neural Computing and Applications;2024-06-03

3. Ön eğitimli Bert modeli ile patent sınıflandırılması;Gazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi;2024-05-20

4. A Novel Pretrained General-purpose Vision Language Model for the Vietnamese Language;ACM Transactions on Asian and Low-Resource Language Information Processing;2024-05-10

5. Transmission Line Fault Classification Using Conformer Convolution-Augmented Transformer Model;Applied Sciences;2024-05-09