Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training-Reference-Cited by-同舟云学术

Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training

Published:2022 Issue: Volume: Page:69-87
ISSN:0302-9743
Container-title:Lecture Notes in Computer Science
language:
Short-container-title:

Author:

You Haoxuan,Zhou Luowei,Xiao Bin,Codella Noel,Cheng Yu,Xu Ruochen,Chang Shih-Fu,Yuan Lu

Publisher

Springer Nature Switzerland

Link

https://link.springer.com/content/pdf/10.1007/978-3-031-19812-0_5

Reference68 articles.

1. Akbari, H., et al.: VATT: transformers for multimodal self-supervised learning from raw video, audio and text. arXiv preprint arXiv:2104.11178 (2021)

2. Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)

3. Lecture Notes in Computer Science;L Bossard,2014

4. Lecture Notes in Computer Science;Z Cai,2016

5. Lecture Notes in Computer Science;J Cao,2020

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Feature Disentanglement and Adaptive Fusion for Improving Multi-modal Tracking;Pattern Recognition and Computer Vision;2023-12-28

2. Gauging the Limitations of Natural Language Supervised Text-Image Metrics Learning by Iconclass Visual Concepts;Proceedings of the 7th International Workshop on Historical Document Imaging and Processing;2023-08-25

3. Learning Customized Visual Models with Retrieval-Augmented Knowledge;2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR);2023-06

4. Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens;2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR);2023-06

5. CLIPPO: Image-and-Language Understanding from Pixels Only;2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR);2023-06