1. Cross-modal music-video recommendation: A study of design choices;Prétet,2021
2. Cross-modal variational auto-encoder for content-based micro-video background music recommendation;Yi;IEEE Trans. Multimed.,2021
3. D. Surís, C. Vondrick, B. Russell, J. Salamon, It’s Time for Artistic Correspondence in Music and Video, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 10564–10574.
4. A simple framework for contrastive learning of visual representations;Chen,2020
5. M. Cheng, Y. Sun, L. Wang, X. Zhu, K. Yao, J. Chen, G. Song, J. Han, J. Liu, E. Ding, et al., ViSTA: vision and scene text aggregation for cross-modal retrieval, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 5184–5193.