1. CLIPScore: A Reference-free Evaluation Metric for Image Captioning
2. Open-vocabulary object detection via vision and language knowledge distillation;gu;ICLRE,0
3. MDETR - Modulated Detection for End-to-End Multi-Modal Understanding
4. Categorical reparameterization with gumbel-softmax;jang;International Conference on Learning Representations,0
5. RegionCLIP: Region-based Language-Image Pretraining