1. Max Bain, Arsha Nagrani, Gül Varol, and Andrew Zisserman. 2021. Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10--17, 2021. IEEE, 1708--1718. https://doi.org/10.1109/ ICCV48922.2021.00175
2. Xing Cheng, Hezheng Lin, XiangyuWu, Fan Yang, and Dong Shen. 2021. Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss. CoRR abs/2109.04290 (2021). arXiv:2109.04290 https://arxiv.org/abs/2109. 04290
3. Youngok Choi and Edie M Rasmussen. 2002. Users' relevance criteria in image retrieval in American history. Information processing & management 38, 5 (2002), 695--726.
4. TeachText: CrossModal Generalized Distillation for Text-Video Retrieval