1. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding;Devlin
2. Improving Language Understanding by Generative Pre-Training;Radford
3. Exploring the limits of transfer learning with a unified text-to-text transformer;Raffel;The Journal of Machine Learning Research,2020
4. Pre-trained summarization distillation;Shleifer;arXiv preprint, arXiv:2010.13002,2020