1. Self-supervised multimodal versatile networks;alayrac;Adv in NeurIPS,2020
2. Self-supervised learning by cross-modal audiovideo clustering;alwassel;Advances in neural information processing systems,2020
3. The million song dataset;bertin-mahieux,2011
4. Learning transferable visual models from natural language supervision;radford;ICML,2021