1. Natural language processing: an introduction
2. Exploring the limits of transfer learning with a unified text-to-text transformer;Raffel;arXiv,2019
3. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension;Lewis;arXiv,2019
4. HIBERT: Document level pre-training of hierarchical bidirectional transformers for document summarization;Zhang;arXiv,2019
5. Improving Language Understanding by Generative Pre-Training
https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf