1. Look, Listen and Learn
2. Alexei Baevski , Henry Zhou , Abdel rahman Mohamed, and Michael Auli . 2020 . wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations. ArXiv , Vol. abs/ 2006 .11477 (2020). Alexei Baevski, Henry Zhou, Abdel rahman Mohamed, and Michael Auli. 2020. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations. ArXiv, Vol. abs/2006.11477 (2020).
3. Max Bain , Arsha Nagrani , A. Brown , and Andrew Zisserman . 2020 . Condensed Movies: Story Based Retrieval with Contextual Embeddings. ArXiv , Vol. abs/ 2005 .04208 (2020). Max Bain, Arsha Nagrani, A. Brown, and Andrew Zisserman. 2020. Condensed Movies: Story Based Retrieval with Contextual Embeddings. ArXiv, Vol. abs/2005.04208 (2020).
4. Max Bain , Arsha Nagrani , Gül Varol , and Andrew Zisserman . 2021 . Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval. 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021), 1708--1718. Max Bain, Arsha Nagrani, Gül Varol, and Andrew Zisserman. 2021. Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval. 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021), 1708--1718.
5. LIRIS-ACCEDE: A Video Database for Affective Content Analysis