1. Bert: Pre-training of deep bidi-rectional transformers for language understanding;Devlin
2. Exploring the limits of transfer learning with a unified text-to-text transformer;Raffel;The Journal of Machine Learning Research,2020
3. Xlnet: Generalized autoregressive pre-training for language understanding;Yang;Advances in neural information processing systems,2019