1. Amato, G., et al.: VISIONE at video browser showdown 2023. In: Dang-Nguyen, DT., et al. (eds.) MultiMedia Modeling, MMM 2023. LNCS, vol. 13833, pp. 615–621. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-27077-2_48
2. Jónsson, B.Þ., Khan, O.S., Koelma, D.C., Rudinac, S., Worring, M., Zahálka, J.: Exquisitor at the video browser showdown 2020. In: Ro, Y.M., et al. (eds.) MMM 2020, Part II 26. LNCS, vol. 11962, pp. 796–802. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_72
3. Lecture Notes in Computer Science;Y Lee,2021
4. Li, J., Li, D., Savarese, S., Hoi, S.: BLIP-2: bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597 (2023)
5. Lokoč, J., et al.: Interactive video retrieval in the age of effective joint embedding deep models: lessons from the 11th VBS. Multimedia Syst. 29(10), 1–24 (2023)