1. Arian Bakhtiarnia 2021. Multi-exit vision transformer for dynamic inference. arXiv preprint arXiv:2106.15183 (2021).
2. Nicolas Carion 2020. End-to-end object detection with transformers. In Computer Vision–ECCV 2020: 16th European conf., Glasgow, UK, August 23–28, 2020, Proc., Part I 16. Springer, 213–229.
3. Krzysztof Choromanski 2020. Rethinking attention with performers. arXiv preprint arXiv:2009.14794 (2020).
4. ImageNet: A large-scale hierarchical image database
5. Jacob Devlin 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arxiv:1810.04805 [cs.CL]