1. Sami Abu-El-Haija , Nisarg Kothari , Joonseok Lee , Apostol Natsev , George Toderici , Balakrishnan Varadarajan , and Sudheendra Vijayanarasimhan . 2016. YouTube-8M: A Large-Scale Video Classification Benchmark. ArXiv , Vol. abs/ 1609 .08675 ( 2016 ). Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Apostol Natsev, George Toderici, Balakrishnan Varadarajan, and Sudheendra Vijayanarasimhan. 2016. YouTube-8M: A Large-Scale Video Classification Benchmark. ArXiv, Vol. abs/1609.08675 (2016).
2. Hassan Akbari , Linagzhe Yuan , Rui Qian , Wei-Hong Chuang , Shih-Fu Chang , Yin Cui , and Boqing Gong . 2021 . VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text. In Neural Information Processing Systems. Hassan Akbari, Linagzhe Yuan, Rui Qian, Wei-Hong Chuang, Shih-Fu Chang, Yin Cui, and Boqing Gong. 2021. VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text. In Neural Information Processing Systems.
3. Max Bain , Arsha Nagrani , Andrew Brown , and Andrew Zisserman . 2020 . Condensed movies: Story based retrieval with contextual embeddings . In Proceedings of the Asian Conference on Computer Vision. Max Bain, Arsha Nagrani, Andrew Brown, and Andrew Zisserman. 2020. Condensed movies: Story based retrieval with contextual embeddings. In Proceedings of the Asian Conference on Computer Vision.
4. LIRIS-ACCEDE: A Video Database for Affective Content Analysis
5. MovieCLIP: Visual Scene Recognition in Movies