1. VL-InterpreT: An interactive visualization tool for interpreting vision-language transformers;Aflalo,2022
2. Flamingo: A visual language model for few-shot learning;Alayrac,2022
3. Language models are few-shot learners;Brown,2020
4. Behind the scene: Revealing the secrets of pre-trained vision-and-language models;Cao,2020
5. UNITER: UNiversal Image-TExt representation learning;Chen,2020