1. Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yun-Hsuan Sung, Zhen Li, Tom Duerig, Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision, in: ICML, 2021, pp. 4904–4916.
2. Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever, Learning Transferable Visual Models From Natural Language Supervision, in: ICML, 2021, pp. 8748–8763.
3. SLIP: self-supervision meets language-image pre-training, CoRR abs/2112.12750;Mu,2021
4. RS-CLIP: Zero shot remote sensing scene classification via contrastive vision-language supervision;Li;Int. J. Appl. Earth Obs. Geoinf.,2023
5. Learning to prompt for vision-language models;Zhou;IJCV,2022