1. Deep audio-visual speech recognition;Afouras;IEEE Trans. Pattern Anal. Mach. Intell.,2018
2. Self-supervised learning of audio-visual objects from video;Afouras,2020
3. Self-supervised learning by cross-modal audio-video clustering;Alwassel;Adv. Neural Inf. Proces. Syst.,2020
4. Look, listen and learn;Arandjelovic,2017
5. Tempsal-uncovering temporal information for deep saliency prediction;Aydemir,2023