1. Attention is all you need;Vaswani;Advances in neural information processing systems,2017
2. Bert: Pre-training of deep bidirectional transformers for language understanding;Kenton
3. Improving language understanding by generative pre-training;Radford,2018
4. Language models are unsupervised multitask learners;Radford;OpenAI blog,2019
5. Language models are few-shot learners;Brown;Advances in neural information processing systems,2020