1. A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, I. Sutskever, “Learning Transferable Visual Models From Natural Language Supervision,” arXiv:2103.00020, 2021.
2. C. Schuhmann, R. Beaumont, R. Vencu, C. Gordon, R. Wightman, M. Cherti, T. Coombes, A. Katta, C. Mullis, M. Wortsman, P. Schramowski, S. Kundurthy, K. Crowson, L. Schmidt, R. Kaczmarczyk, J. Jitsev, “LAION-5B: An open large-scale dataset for training next generation image-text models,” In 36th Conference on Neural Information Processing Systems (NeurIPS), 2022.
3. G. Awad, K. Curtis, A. A. Butt, J. Fiscus, A. Godil, Y. Lee, A. Delgado, J. Zhang, E. Godard, B. Chocot, L. Diduch, J. Liu, Y. Graham, G. Quénot, “An overview on the evaluated video retrieval tasks at TRECVID 2022,” In Proc. of TRECVID 2022, 2022.
4. K. Ueki, K. Hirakawa, K. Kikuchi, T. Ogawa, T. Kobayashi, “Waseda_Meisei at TRECVID 2017: Ad-hoc Video Search,” In Proc. of TRECVID 2017, 2017.
5. VideoStory