1. Unsupervised Visual Representation Learning by Context Prediction
2. Bert: Pre-training of deep bidirectional transformers for language understanding;devlin;ArXiv Preprint,2018
3. An image is worth 16×16 words: Transformers for image recognition at scale;dosovitskiy;ArXiv Preprint,2020
4. Peco: Perceptual codebook for bert pre-training of vision transformers;dong;ArXiv Preprint,2021
5. ImageNet: A large-scale hierarchical image database