1. Bert: Pre-training of deep bidirectional transformers for language understanding;Devlin,2018
2. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension;Lewis
3. Improving language understanding by generative pre-training;Radford,2018
4. Exploring the limits of transfer learning with a unified text-to-text transformer;Raffel,2019