1. Beit: Bert pre-training of image transformers;Bao
2. Language models are few-shot learners;Brown;Adv. Neural Inform. Process. Syst.,2020
3. COCO-Stuff: Thing and Stuff Classes in Context
4. Rethinking Why Intermediate-Task Fine-Tuning Works
5. Adaptformer: Adapting vision transformers for scalable visual recognition;Chen,2022