1. Grounded language-image pre-training;harold li;Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),0
2. Productlm: Towards weakly supervised instance-level product retrieval via cross-modal pretraining;zhan;Proceedings of the IEEE/CVF International Conference on Computer Vision,0
3. Align before fuse: Vision and language representation learning with momentum distillation;li;Advances in neural information processing systems,2021
4. Multi-grained vision language pre-training: Aligning texts with visual concepts;zeng;ArXiv Preprint,2021
5. Object-centric learning with slot attention;locatello;Advances in neural information processing systems,2020