1. VinVL: Revisiting Visual Representations in Vision-Language Models
2. Multi-scale vision long-former: A new vision transformer for high-resolution image encoding;zhang;Proceedings of the IEEE/CVF International Conference on Computer Vision,0
3. mixup: Beyond empirical risk minimization;hongyi;ArXiv Preprint,2017
4. From recognition to cognition: Visual commonsense rea-soning;zellers;Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,0
5. Oscar: Object-semantics aligned pre-training for vision-language tasks;li;European Conference on Computer Vision,2020