1. Webly supervised concept expansion for general purpose vision models;kamath;ECCV,0
2. Inferring and Executing Programs for Visual Reasoning
3. Vilt: Vision-and-language transformer without convolution or region supervision;kim;ICML,0
4. Decomposed prompting: A modular approach for solving complex tasks;khot;ArXiv,2022
5. An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA