Author:
Zhang Shuo,Yang Biao,Li Zhang,Ma Zhiyin,Liu Yuliang,Bai Xiang
Publisher
Springer Nature Switzerland
Reference43 articles.
1. Alayrac, J.B., et al.: Flamingo: a visual language model for few-shot learning. Adv. Neural. Inf. Process. Syst. 35, 23716–23736 (2022)
2. Antol, S., et al.: VQA: visual question answering. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2425–2433 (2015)
3. Bai, J., et al.: Qwen-vl: a frontier large vision-language model with versatile abilities. arXiv preprint arXiv:2308.12966 (2023)
4. Biten, A.F., et al.: Scene text visual question answering. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4291–4301 (2019)
5. Chen, W., et al.: TabFact: a large-scale dataset for table-based fact verification. arXiv preprint arXiv:1909.02164 (2019)