1. Uniclip: Unified framework for contrastive language-image pre-training;Lee;Adv. Neural Inf. Process. Syst.,2022
2. Learning visual representation from modality-shared contrastive language-image pre-training;You,2022
3. Learning transferable visual models from natural language supervision;Radford,2021
4. Representation learning with contrastive predictive coding;Oord,2018
5. An image is worth 16x16 words: Transformers for image recognition at scale;Dosovitskiy,2020