Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens-Reference-Cited by-同舟云学术

Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens

Published:2023-06 Issue: Volume: Page:
ISSN:
Container-title:2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
language:
Short-container-title:

Author:

Chen Yuxiao¹,Yuan Jianbo²,Tian Yu²,Geng Shijie¹,Li Xinyu²,Zhou Ding²,Metaxas Dimitris N.¹,Yang Hongxia³

Affiliation:

1. Rutgers University

2. ByteDance Inc.

3. Zhejiang University

Publisher

IEEE

Link

http://xplorestaging.ieee.org/ielx7/10203037/10203050/10204468.pdf?arnumber=10204468

Reference43 articles.

1. Deep Residual Learning for Image Recognition

2. Vision-Language Pre-Training with Triple Contrastive Learning

3. HiCLIP: Contrastive language-image pretraining with hierarchy-aware attention;geng;In The Eleventh International Conference on Learning Representations,0

4. What is considered complete for visual recognition?;xie;ArXiv Preprint,2021

5. Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Multi-Modal CLIP-Informed Protein Editing;2024-07-28

2. A Learning-Based Hierarchical Edge Data Corruption Detection Framework in Edge Intelligence;IEEE Internet of Things Journal;2024-05-15

3. Soft Contrastive Cross-Modal Retrieval;Applied Sciences;2024-02-27

4. Advance One-Shot Multispectral Instance Detection With Text's Supervision;IEEE Signal Processing Letters;2024

5. Representation and Granularity Joint Alignment Framework for Multimodal Sarcasm Detection on Social Media;Lecture Notes in Computer Science;2024