1. Vatt: Transformers for multimodal self-supervised learning from raw video, audio and text;Akbari Hassan;Advances in Neural Information Processing Systems,2021
2. Self-supervised multimodal versatile networks;Alayrac Jean-Baptiste;Advances in Neural Information Processing Systems,2020
3. Noise Estimation Using Density Estimation for Self-Supervised Multimodal Learning
4. Aiding Intra-Text Representations with Visual Context for Multimodal Named Entity Recognition
5. Tadas Baltrušaitis , Chaitanya Ahuja , and Louis-Philippe Morency . 2018. Multimodal machine learning: A survey and taxonomy . IEEE transactions on pattern analysis and machine intelligence, Vol. 41 , 2 ( 2018 ), 423--443. Tadas Baltrušaitis, Chaitanya Ahuja, and Louis-Philippe Morency. 2018. Multimodal machine learning: A survey and taxonomy. IEEE transactions on pattern analysis and machine intelligence, Vol. 41, 2 (2018), 423--443.