1. Labelling unlabelled videos from scratch with multi-modal self-supervision;Asano;Advances in Neural Information Processing Systems,2020
2. Soundnet: Learning sound representations from unlabeled video;Aytar;Advances in neural information processing systems,2016
3. A Deep Siamese Network for Scene Detection in Broadcast Videos