1. Zero-Shot Object Detection
2. Zhaowei Cai , Gukyeong Kwon , Avinash Ravichandran , Erhan Bas , Zhuowen Tu , Rahul Bhotika , and Stefano Soatto . 2022 . X-DETR: A Versatile Architecture for Instance-wise Vision-Language Tasks. European Conference on Computer Vision (2022). Zhaowei Cai, Gukyeong Kwon, Avinash Ravichandran, Erhan Bas, Zhuowen Tu, Rahul Bhotika, and Stefano Soatto. 2022. X-DETR: A Versatile Architecture for Instance-wise Vision-Language Tasks. European Conference on Computer Vision (2022).
3. Correspondence Matters for Video Referring Expression Comprehension
4. Towards Bridging Video and Language by Caption Generation and Sentence Localization
5. Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model