Author:
Wang Zhiwen,Zhang Donglin,Hu Zhikai
Funder
National Natural Science Foundation of China
Fundamental Research Funds for the Central Universities
Publisher
Springer Science and Business Media LLC
Reference54 articles.
1. Wang J, Hua Y, Yang Y, Kou H (2023) Spsd: similarity-preserving self-distillation for video-text retrieval. Int J Multimed Inf Retr 12(2):32
2. Mithun NC, Li J, Metze F, Chowdhury AKR (2019) Joint embeddings with multimodal cues for video-text retrieval. Int J Multimed Inf Retr 8:3–18
3. Gabeur V, Sun C, Alahari K, Schmid C (2020) Multi-modal transformer for video retrieval. In Computer vision–ECCV 2020: 16th european conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IV 16, pp. 214–229. Springer
4. Liu Y, Albanie S, Nagrani A, Zisserman A (2019) Use what you have: video retrieval using representations from collaborative experts. arXiv preprint arXiv:1907.13487
5. Lei J, Li L, Zhou L, Gan Z, Berg TL, Bansal M, Liu J (2021) Less is more: clipbert for video-and-language learning via sparse sampling. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7331–7341