Token Embeddings Alignment for Cross-Modal Retrieval-Reference-Cited by-同舟云学术

Token Embeddings Alignment for Cross-Modal Retrieval

Published:2022-10-10 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the 30th ACM International Conference on Multimedia
language:
Short-container-title:

Author:

Xie Chen-Wei¹,Wu Jianmin¹,Zheng Yun¹,Pan Pan¹,Hua Xian-Sheng²

Affiliation:

1. Alibaba Group, Hangzhou, China

2. Zhejiang University, Hangzhou, China

Publisher

ACM

Link

https://dl.acm.org/doi/pdf/10.1145/3503161.3548107

Reference43 articles.

1. Learning to Scale Multilingual Representations for Vision-Language Tasks

2. Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts

3. UNITER: UNiversal Image-TExt Representation Learning

4. Ekin D Cubuk , Barret Zoph , Jonathon Shlens , and Quoc V Le . 2020 . Randaugment: Practical automated data augmentation with a reduced search space. In CVPRW. 702--703. Ekin D Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V Le. 2020. Randaugment: Practical automated data augmentation with a reduced search space. In CVPRW. 702--703.

5. ImageNet: A large-scale hierarchical image database

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Bridging Modalities: A Survey of Cross-Modal Image-Text Retrieval;Chinese Journal of Information Fusion;2024-06-12

2. Cross-Modal Multi-Source Public Data Fusion and Retrieval using Knowledge Distillation Method;2023 9th International Conference on Computer and Communications (ICCC);2023-12-08

3. Knowledge Decomposition and Replay: A Novel Cross-modal Image-Text Retrieval Continual Learning Method;Proceedings of the 31st ACM International Conference on Multimedia;2023-10-26

4. Cross-modal representation learning and generation;Journal of Image and Graphics;2023