1. Hassan Akbari , Liangzhe Yuan , Rui Qian , Wei-Hong Chuang , Shih-Fu Chang , Yin Cui , and Boqing Gong . 2021 . Vatt: Transformers for multimodal self-supervised learning from raw video, audio and text. NIPS (2021). Hassan Akbari, Liangzhe Yuan, Rui Qian, Wei-Hong Chuang, Shih-Fu Chang, Yin Cui, and Boqing Gong. 2021. Vatt: Transformers for multimodal self-supervised learning from raw video, audio and text. NIPS (2021).
2. Jean-Baptiste Alayrac Jeff Donahue Pauline Luc Antoine Miech Iain Barr Yana Hasson Karel Lenc Arthur Mensch Katie Millican Malcolm Reynolds etal 2022. Flamingo: a visual language model for few-shot learning. arXiv preprint arXiv:2204.14198 (2022). Jean-Baptiste Alayrac Jeff Donahue Pauline Luc Antoine Miech Iain Barr Yana Hasson Karel Lenc Arthur Mensch Katie Millican Malcolm Reynolds et al. 2022. Flamingo: a visual language model for few-shot learning. arXiv preprint arXiv:2204.14198 (2022).
3. Stanislaw Antol , Aishwarya Agrawal , Jiasen Lu , Margaret Mitchell , Dhruv Batra , C Lawrence Zitnick , and Devi Parikh . 2015 . Vqa: Visual question answering. In ICCV. Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C Lawrence Zitnick, and Devi Parikh. 2015. Vqa: Visual question answering. In ICCV.
4. Hangbo Bao , Wenhui Wang , Li Dong , Qiang Liu , Owais Khan Mohammed , Kriti Aggarwal, Subhojit Som, and Furu Wei. 2021 . Vlmo : Unified vision-language pre-training with mixture-of-modality-experts. arXiv preprint arXiv:2111.02358 (2021). Hangbo Bao, Wenhui Wang, Li Dong, Qiang Liu, Owais Khan Mohammed, Kriti Aggarwal, Subhojit Som, and Furu Wei. 2021. Vlmo: Unified vision-language pre-training with mixture-of-modality-experts. arXiv preprint arXiv:2111.02358 (2021).
5. Hao Chen , Ran Tao , Han Zhang , Yidong Wang , Wei Ye , Jindong Wang , Guosheng Hu , and Marios Savvides . 2022. Conv-Adapter: Exploring Parameter Efficient Transfer Learning for ConvNets. arXiv preprint arXiv:2208.07463 ( 2022 ). Hao Chen, Ran Tao, Han Zhang, Yidong Wang, Wei Ye, Jindong Wang, Guosheng Hu, and Marios Savvides. 2022. Conv-Adapter: Exploring Parameter Efficient Transfer Learning for ConvNets. arXiv preprint arXiv:2208.07463 (2022).