1. Attention is All You Need;vaswani;Advances in neural information processing systems,2017
2. Masked Autoencoders Are Scalable Vision Learners
3. Training Data-efficient Image Transformers & Distillation through Attention;touvron;International Conference on Machine Learning,2021
4. SeMask: Semantically Masked Transformers for Semantic Segmentation;jain;ArXiv Preprint,2021