1. Deep audio-visual speech recognition;Afouras;IEEE TPAMI,2018
2. SUGILITE
3. SoundSpaces: Audio-Visual Navigation in 3D Environments
4. What makes multi-modal learning better than single (provably);Huang;NeurIPS,2021
5. Foundations and recent trends in multimodal machine learning: Principles, challenges, and open questions;Liang;CVPR,2022