1. An image is worth 16x16 words: Transformers for image recognition at scale;dosovitskiy,2020
2. Attention is all you need;vaswani;NeurIPS,2017
3. RSVQA meets BigEarth-Net: A new, large-scale, visual question answering dataset for remote sensing;lobry;IEEE IGARSS,2021
4. Well-read students learn better: On the importance of pre-training compact models;turc,2019
5. XCiT: Cross-covariance image transformers;el-nouby;NeurIPS,2021