X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval-Reference-Cited by-同舟云学术

X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval

Published:2022-06 Issue: Volume: Page:
ISSN:
Container-title:2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
language:
Short-container-title:

Author:

Gorti Satya Krishna¹,Vouitsis Noel¹,Ma Junwei¹,Golestan Keyvan¹,Volkovs Maksims¹,Garg Animesh¹,Yu Guangwei¹

Affiliation:

1. Layer 6 AI

Publisher

IEEE

Link

http://xplorestaging.ieee.org/ielx7/9878378/9878366/09879391.pdf?arnumber=9879391

Reference46 articles.

1. Attention is all you need;vaswani;Advances in neural information processing systems,2017

2. Look at what i'm doing: Self-supervised spatial grounding of narrations in instructional videos;tan;Advances in neural information processing systems,2021

3. Learning transferable visual models from natural language super-vision;radford;ArXiv Preprint,2021

4. A straightforward framework for video retrieval using clip;andrés portillo-quintero;Mexican Conference on Pattern Recognition,0

5. Support-set bottlenecks for video-text representation learning;patrick;ArXiv Preprint,2020

Cited by 60 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Text-guided distillation learning to diversify video embeddings for text-video retrieval;Pattern Recognition;2024-12

2. Rethink video retrieval representation for video captioning;Pattern Recognition;2024-12

3. TB-Net: Intra- and inter-video correlation learning for continuous sign language recognition;Information Fusion;2024-09

4. CLIP2TF:Multimodal video–text retrieval for adolescent education;Displays;2024-09

5. LSECA: local semantic enhancement and cross aggregation for video-text retrieval;International Journal of Multimedia Information Retrieval;2024-07-22