1. Language models are unsupervised multitask learners;radford;OpenAIRE blog,2019
2. Improving language understanding by generative pre-training;radford;ArXiv,2018
3. Compacter: Efficient low-rank hypercomplex adapter layers;karimi mahabadi;Thirty-Fifth Conference on Neural Information Processing Systems,0
4. Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks;lu;NeurIPS,2019
5. Roberta: A robustly optimized BERT pretraining approach;liu;CoRR abs/1907 11692,2019