1. Chowdhury N, Li J, Metze F, Roy-Chowdhury A (2019) Joint embeddings with multimodal cues for video-text retrieval. Int J Multimed Inf Retr
2. Ugale P, Mali S (2022) Recent trends and techniques of CBIR to enhance retrieval performance. In: Kumar A, Mozar S (eds) ICCCE 2021. Lecture notes in electrical engineering, vol 828. Springer, Singapore
3. Xun Y, Shanshan W, Jian D, Jianfeng D (2022) Video moment retrieval with cross modal neural architecture search
4. Abu-El-Haija S, Kothari N, Lee J, Natsev P, Toderici G, Varadarajan B, Vijayanarasimhan S (2016) Youtube-8m: a large-scale video classification benchmark. CoRR abs/1609.08675
5. Satya Krishna G, Noel V, Junwei M, Keyvan G, Maksims V, Animesh G, Guangwei Y, X-pool: cross-modal language-video attention for T-to-V retrieval