1. Hangbo Bao , Li Dong , and Furu Wei . 2021. BEiT: BERT Pre-Training of Image Transformers. arXiv:2106.08254 ( 2021 ). Hangbo Bao, Li Dong, and Furu Wei. 2021. BEiT: BERT Pre-Training of Image Transformers. arXiv:2106.08254 (2021).
2. Emanuel Ben-Baruch , Tal Ridnik , Nadav Zamir , Asaf Noy , Itamar Friedman , Matan Protter , and Lihi Zelnik-Manor . 2021. Asymmetric Loss For Multi-Label Classification. arXiv:2009.14119 ( 2021 ). Emanuel Ben-Baruch, Tal Ridnik, Nadav Zamir, Asaf Noy, Itamar Friedman, Matan Protter, and Lihi Zelnik-Manor. 2021. Asymmetric Loss For Multi-Label Classification. arXiv:2009.14119 (2021).
3. Emanuele Bugliarello , Ryan Cotterell , Naoaki Okazaki , and Desmond Elliott . 2020. Multimodal Pretraining Unmasked: Unifying the Vision and Language BERTs. arXiv:2011.15124 ( 2020 ). Emanuele Bugliarello, Ryan Cotterell, Naoaki Okazaki, and Desmond Elliott. 2020. Multimodal Pretraining Unmasked: Unifying the Vision and Language BERTs. arXiv:2011.15124 (2020).
4. Christel Chappuis , Sylvain Lobry , Benjamin Kellenberger , Bertrand Le Saux, and Devis Tuia . 2021 . How to find a good image-text embedding for remote sensing visual question answering? arXiv:2109.11848 (2021). Christel Chappuis, Sylvain Lobry, Benjamin Kellenberger, Bertrand Le Saux, and Devis Tuia. 2021. How to find a good image-text embedding for remote sensing visual question answering? arXiv:2109.11848 (2021).
5. Prompt–RSVQA: Prompting visual context to a language model for Remote Sensing Visual Question Answering