Improving the Consistency in Cross-Lingual Cross-Modal Retrieval with 1-to-K Contrastive Learning-Reference-Cited by-同舟云学术

Improving the Consistency in Cross-Lingual Cross-Modal Retrieval with 1-to-K Contrastive Learning

Published:2024-08-24 Issue: Volume:35 Page:2272-2283
ISSN:
Container-title:Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
language:
Short-container-title:

Author:

Nie Zhijie¹^ORCID,Zhang Richong¹^ORCID,Feng Zhangchi¹^ORCID,Huang Hailang¹^ORCID,Liu Xudong¹^ORCID

Affiliation:

1. CCSE, Beihang University, Beijing, China

Funder

National Science and Technology Major Project

National Natural Science Foundation of China

Fundamental Research Funds for the Central Universities

Publisher

ACM

Link

https://dl.acm.org/doi/pdf/10.1145/3637528.3671787

Reference42 articles.

1. VLMo: Unified vision-language pre-training with mixture-of-modality-experts;Bao Hangbo;Advances in Neural Information Processing Systems,2022

2. Emanuele Bugliarello, Fangyu Liu, Jonas Pfeiffer, Siva Reddy, Desmond Elliott, Edoardo Maria Ponti, and Ivan Vulić. 2022. IGLUE: A benchmark for transfer learning across modalities, tasks, and languages. In International Conference on Machine Learning. PMLR, 2370--2392.

3. Fredrik Carlsson, Philipp Eisen, Faton Rekathati, and Magnus Sahlgren. 2022. Cross-lingual and Multilingual CLIP. In Proceedings of the Thirteenth Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France, 6848--6854. https://aclanthology.org/2022.lrec-1.739

4. Soravit Changpinyo, Piyush Sharma, Nan Ding, and Radu Soricut. 2021. Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts. In CVPR. Computer Vision Foundation / IEEE, 3558--3568.

5. Xinlei Chen, Hao Fang, Tsung-Yi Lin, Ramakrishna Vedantam, Saurabh Gupta, Piotr Dollár, and C Lawrence Zitnick. 2015. Microsoft coco captions: Data collection and evaluation server. arXiv preprint arXiv:1504.00325 (2015).