1. Attention is all you need;Vaswani;arXiv:1706.03762,2017
2. BERT: Pre-training of deep bidirectional transformers
for language understanding;Devlin
3. BART: Denoising sequence-to-sequence pre-training for
natural language generation, translation, and
comprehension;Lewis;arXiv:1910.13461,2019
4. Improving language understanding by generative
pre-training;Radford,2018
5. Google’s neural
machine translation system: Bridging the gap between human and machine
translation;Wu;arXiv:1609.08144,2016