1. Attention is all you need;Vaswani;Adv Neural Inf Process Syst,2017
2. Bert: pre-training of deep bidirectional transformers for language understanding;Devlin;Proceedings of NAACL-HLT,2019
3. Exploring the limits of transfer learning with a unified text-to-text transformer;Raffel;J Mach Learn Res,2020
4. Improving Language Understanding by Generative Pre-training;Radford,2018
5. Language models are unsupervised multitask learners;Radford;OpenAI blog,2019