1. Beit: BERT pre-training of image transformers;Bao
2. Language models are few-shot learners;Brown;NeurIPS,2020
3. A simple framework for contrastive learning of visual representations;Chen
4. NODIS: Neural Ordinary Differential Scene Understanding
5. BERT: pre-training of deep bidirectional transformers for language understanding;Devlin;NAACL,2019