1. Places: A 10 Million Image Database for Scene Recognition
2. Audio-Visual Embedding for Cross-Modal Music Video Retrieval through Supervised Deep CCA
3. Telling Left from Right: Learning Spatial Correspondence of Sight and Sound;yang;IEEE Computer Vision and Pattern Recognition (CVPR),2021
4. The Open Images Dataset V4
5. Background music recommendation for video based on multi-modal latent semantic analysis;kuo;2013 IEEE International Conference on Multimedia and Expo (ICME),0