1. Fashionvlp: Vision language transformer for fashion re-trieval with feedback;goenka;CVPR,2022
2. Composing Text and Image for Image Retrieval - an Empirical Odyssey
3. An image is worth one word: Personalizing text-to-image generation using textual inversion;gal;ArXiv Preprint,2022
4. CLIP Models are Few-Shot Learners: Empirical Studies on VQA and Visual Entailment
5. The many faces of robust-ness: A critical analysis of out-of-distribution generalization;hendrycks;Proceedings of the IEEEICVF International Conference on Computer Vision,0