1. Multimodal few-shot learning with frozen language models;Tsimpoukelli,2021
2. Learning transferable visual models from natural language supervision;Radford,2021
3. Representation learning with contrastive predictive coding;den Oord,2019
4. A simple framework for contrastive learning of visual representations;Chen,2020
5. Vl-bert: Pre-training of generic visual-linguistic representations;Su,2020