Subject
Software,Information Systems,Human-Computer Interaction,Computer Vision and Pattern Recognition,Computer Science Applications,Artificial Intelligence
Reference75 articles.
1. Flamingo: a visual language model for few-shot learning;Alayrac,2022
2. Anderson, Peter, He, Xiaodong, Buehler, Chris, Teney, Damien, Johnson, Mark, Gould, Stephen, Zhang, Lei, 2018a. Bottom-up and top-down attention for image captioning and visual question answering. In: Proceedings of CVPR. pp. 6077–6086.
3. Anderson, Peter, Wu, Qi, Teney, Damien, Bruce, Jake, Johnson, Mark, Sünderhauf, Niko, Reid, Ian, Gould, Stephen, van den Hengel, Anton, 2018b. Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments. In: Proceedings of CVPR.
4. Antol, Stanislaw, Agrawal, Aishwarya, Lu, Jiasen, Mitchell, Margaret, Batra, Dhruv, Zitnick, C Lawrence, Parikh, Devi, 2015. VQA: Visual question answering. In: Proceedings of ICCV. pp. 2425–2433.
5. Visual prompting: Modifying pixel space to adapt pre-trained models;Bahng,2022