1. Bert: Pre-training of deep bidirectional transformers for language understanding;Devlin,2018
2. Roberta: A robustly optimized bert pretraining approach;Liu,2019
3. Language models are few-shot learners;Brown;Advances in neural information processing systems,2020
4. Exploring the limits of transfer learning with a unified text-to-text transformer;Raffel;The Journal of Machine Learning Research,2020
5. Analysing syntactic and semantic features in pre-trained language models in a fully unsupervised setting;Bölücü