1. VQA: Visual Question Answering
2. ViViT: A Video Vision Transformer
3. A clip-hitchhiker’s guide to long video retrieval;Bain,2022
4. Bridging the gap between object and image-level representations for open-vocabulary detection;Bangalath
5. Monitoring animal behavior in the smart vivarium;Belongie,2005