1. BERT: Pre-training of deep bidirectional transformers for language understanding;Devlin
2. Language models are unsupervised multitask learners;Radford;OpenAI blog,2019
3. Exploring the limits of transfer learning with a unified text-to-text transformer;Raffel;J. Mach. Learn. Res.,2020
4. Aging evolution for image classifier architecture search;Real