1. Vse++: Improving visual-semantic embeddings with hard negatives;faghri;ArXiv Preprint,2017
2. Glipv2: Unifying localization and vision-language understanding;zhang;ArXiv Preprint,2022
3. Open-Vocabulary Object Detection Using Captions
4. Devise: A deep visual-semantic embedding model;frome;Advances in neural information processing systems,2013