1. Attention bottlenecks for multimodal fusion;nagrani;NeurIPS,0
2. Efficient estimation of word representations in vector space;mikolov;ArXiv Preprint,2013
3. Unpaired Image-to-Speech Synthesis With Multimodal Information Bottleneck
4. CLIP4Clip: An empirical study of clip for end to end video clip retrieval;luo;ArXiv Preprint,2021
5. Univl: A unified video and language pre-training model for multimodal understanding and generation;luo;ArXiv Preprint,2020