1. Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition
2. How to train your ViT? data, augmentation, and regularization in vision transformers;steiner,2021
3. See better before looking closer: Weakly supervised data augmentation network for fine-grained visual classification;hu,2019
4. An image is worth 16x16 words: Transformers for image recognition at scale;dosovitskiy;Proc Int Conf Learn Representations,2020
5. Attentional pooling for action recognition;girdhar;Proc Adv Neural Inf Process Syst,2017