1. Deep audio-visual learning: A survey;Zhu;Int. J. Autom. Comput.,2021
2. Look, listen and learn;Arandjelovic,2017
3. Cross-modal embeddings for video and audio retrieval;Surís,2018
4. Emotion-based end-to-end matching between image and music in valence-arousal space;Zhao,2020
5. Query by video: Cross-modal music retrieval;Li,2019