1. Clip-adapter: Better vision-language models with feature adapters;gao;ArXiv Preprint,2021
2. End-to-End Human Object Interaction Detection with HOI Transformer
3. Visual semantic role labeling;gupta;arXiv Computer Vision and Pattern Recognition,2015
4. Open-vocabulary object detection via vision and language knowledge distillation;gu;Learning,2021