1. Self-supervised object detection from audio-visual correspondence
2. RAVEL: an annotated corpus for training robots with audiovisual abilities
3. R. Arandjelovic and A. Zisserman . 2017. Look , Listen and Learn. In IEEE/CVF Inter. Conf. on Computer Vision. 609–617 . R. Arandjelovic and A. Zisserman. 2017. Look, Listen and Learn. In IEEE/CVF Inter. Conf. on Computer Vision. 609–617.
4. The CAVA corpus
5. Davide Berghi , Adrian Hilton , and Philip J . B. Jackson. 2021. Visually Supervised Speaker Detection and Localization via Microphone Array . In IEEE 23rd Inter. Workshop on Multimedia Signal Processing. Davide Berghi, Adrian Hilton, and Philip J. B. Jackson. 2021. Visually Supervised Speaker Detection and Localization via Microphone Array. In IEEE 23rd Inter. Workshop on Multimedia Signal Processing.