Weakly-Supervised Grounding for VQA with Dual Visual-Linguistic Interaction-Reference-Cited by-同舟云学术

Weakly-Supervised Grounding for VQA with Dual Visual-Linguistic Interaction

Published:2024 Issue: Volume: Page:156-169
ISSN:0302-9743
Container-title:Lecture Notes in Computer Science
language:en
Short-container-title:

Author:

Liu Yi,Pan Junwen,Wang Qilong,Chen Guanlin,Nie Weiguo,Zhang Yudong,Gao Qian,Hu Qinghua,Zhu Pengfei

Publisher

Springer Nature Singapore

Link

https://link.springer.com/content/pdf/10.1007/978-981-99-8850-1_13

Reference28 articles.

1. Agrawal, A., Batra, D., Parikh, D., Kembhavi, A., Don’t just assume; look and answer: overcoming priors for visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4971–4980 (2018)

2. Antol, S., et al.: VQA: visual question answering. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2425–2433 (2015)

3. Chen, C., Anjum, S., Gurari, D.: Grounding answers for visual questions asked by visually impaired people. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19098–19107 (2022)

4. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)

5. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: North American Chapter of the Association for Computational Linguistics (2018)

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. LCV2: A Universal Pretraining-Free Framework for Grounded Visual Question Answering;Electronics;2024-05-25