1. Deep Residual Learning for Image Recognition
2. Vision-Language Pre-Training with Triple Contrastive Learning
3. HiCLIP: Contrastive language-image pretraining with hierarchy-aware attention;geng;In The Eleventh International Conference on Learning Representations,0
4. What is considered complete for visual recognition?;xie;ArXiv Preprint,2021
5. Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning