1. Roberta: A robustly optimized bert pretraining approach;Liu,2019
2. An image is worth 16x16 words: Transformers for image recognition at scale;Dosovitskiy,2020
3. Vilt: Vision-and-language transformer without convolution or region supervision;Kim
4. Zero time waste: recycling predictions in early exit neural networks;Wołczyk;Advances in Neural Information Processing Systems,2021