1. Bert: Pre-training of deep bidirectional transformers for language understanding;Devlin,2018
2. Language models are unsupervised multitask learners;Radford;OpenAI blog,2019
3. Language models are few-shot learners;Brown;Advances in neural information processing systems,2020