1. Samira Abnar , Mostafa Dehghani , Behnam Neyshabur , and Hanie Sedghi . 2021. Exploring the limits of large scale pre-training. arXiv preprint arXiv:2110.02095 ( 2021 ). Samira Abnar, Mostafa Dehghani, Behnam Neyshabur, and Hanie Sedghi. 2021. Exploring the limits of large scale pre-training. arXiv preprint arXiv:2110.02095 (2021).
2. Jean-Baptiste Alayrac Jeff Donahue Pauline Luc Antoine Miech Iain Barr Yana Hasson Karel Lenc Arthur Mensch Katie Millican Malcolm Reynolds etal 2022. Flamingo: a Visual Language Model for Few-Shot Learning. arXiv preprint arXiv:2204.14198 (2022). Jean-Baptiste Alayrac Jeff Donahue Pauline Luc Antoine Miech Iain Barr Yana Hasson Karel Lenc Arthur Mensch Katie Millican Malcolm Reynolds et al. 2022. Flamingo: a Visual Language Model for Few-Shot Learning. arXiv preprint arXiv:2204.14198 (2022).
3. Tadas Baltruaitis , Chaitanya Ahuja , and Louis-Philippe Morency . 2018. Multimodal machine learning: A survey and taxonomy . IEEE transactions on pattern analysis and machine intelligence 41, 2 ( 2018 ), 423--443. Tadas Baltruaitis, Chaitanya Ahuja, and Louis-Philippe Morency. 2018. Multimodal machine learning: A survey and taxonomy. IEEE transactions on pattern analysis and machine intelligence 41, 2 (2018), 423--443.
4. Maurits Bleeker and Maarten de Rijke . 2020 . Bidirectional Scene Text Recognition with a Single Decoder . In ECAI 2020: 24th European Conference on Artificial Intelligence. IOS Press, 2664--2672 . Maurits Bleeker and Maarten de Rijke. 2020. Bidirectional Scene Text Recognition with a Single Decoder. In ECAI 2020: 24th European Conference on Artificial Intelligence. IOS Press, 2664--2672.
5. Do Lessons from Metric Learning Generalize to Image-Caption Retrieval?