1. Attention is all you need;vaswani;Advances in neural information processing systems,2017
2. Look at what i'm doing: Self-supervised spatial grounding of narrations in instructional videos;tan;Advances in neural information processing systems,2021
3. Learning transferable visual models from natural language super-vision;radford;ArXiv Preprint,2021
4. A straightforward framework for video retrieval using clip;andrés portillo-quintero;Mexican Conference on Pattern Recognition,0
5. Support-set bottlenecks for video-text representation learning;patrick;ArXiv Preprint,2020