1. Bert: Pre-training of deep bidirectional transformers for language understanding;Devlin;arXiv,2018
2. Language models are unsupervised multitask learners;Radford;OpenAI Blog,2019
3. Language models are few-shot learners;Brown;arXiv,2020
4. The illustrated GPT-2 (Visualizing Transformer Language Models)http://jalammar.github.io/illustrated-gpt2/