1. Triantafyllos Afouras , Andrew Owens , Joon Son Chung, and Andrew Zisserman . 2020 . Self-Supervised Learning of Audio-Visual Objects from Video. In ECCV. Triantafyllos Afouras, Andrew Owens, Joon Son Chung, and Andrew Zisserman. 2020. Self-Supervised Learning of Audio-Visual Objects from Video. In ECCV.
2. Jean-Baptiste Alayrac , Adrià Recasens , Rosalia Schneider , Relja Arandjelovi?, Jason Ramapuram , Jeffrey De Fauw , Lucas Smaira, Sander Dieleman, and Andrew Zisserman. 2020 . Self-supervised multimodal versatile networks. arXiv preprint arXiv:2006.16228 (2020). Jean-Baptiste Alayrac, Adrià Recasens, Rosalia Schneider, Relja Arandjelovi?, Jason Ramapuram, Jeffrey De Fauw, Lucas Smaira, Sander Dieleman, and Andrew Zisserman. 2020. Self-supervised multimodal versatile networks. arXiv preprint arXiv:2006.16228 (2020).
3. Humam Alwassel , Dhruv Mahajan , Bruno Korbar , Lorenzo Torresani , Bernard Ghanem , and Du Tran . 2020 . Self-supervised learning by cross-modal audio-video clustering . In NeurIPS , Vol. 33 . Humam Alwassel, Dhruv Mahajan, Bruno Korbar, Lorenzo Torresani, Bernard Ghanem, and Du Tran. 2020. Self-supervised learning by cross-modal audio-video clustering. In NeurIPS, Vol. 33.
4. Relja Arandjelovic and Andrew Zisserman. 2017. Look listen and learn. In ICCV. 609--617. Relja Arandjelovic and Andrew Zisserman. 2017. Look listen and learn. In ICCV. 609--617.
5. Jimmy Lei Ba , Jamie Ryan Kiros, and Geoffrey E Hinton . 2016 . Layer normalization. arXiv preprint arXiv:1607.06450 (2016). Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450 (2016).