1. ImageNet: A large-scale hierarchical image database
2. Barlow twins: Self-supervised learning via redundancy reduction;zbontar;ICML,2021
3. Fine-grained Image Captioning with CLIP Reward
4. Bert: Pre-training of deep bidirectional transformers for language understanding;devlin;ArXiv,2018