From Pixels to Explanations: Uncovering the Reasoning Process in Visual Question Answering-Reference-Cited by-同舟云学术

From Pixels to Explanations: Uncovering the Reasoning Process in Visual Question Answering

Published:2023-12-06 Issue: Volume: Page:
ISSN:
Container-title:ACM Multimedia Asia 2023
language:
Short-container-title:

Author:

Zhang Siqi¹^ORCID,Liu Jing²^ORCID,Wei Zhihua¹^ORCID

Affiliation:

1. Tongji University, CN

2. National Lab of Pattern Recognition, Institute of Automation,Chinese Academy of Sciences, CN

Publisher

ACM

Link

https://dl.acm.org/doi/pdf/10.1145/3595916.3626376

Reference68 articles.

1. Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering

2. Saeed Amizadeh , Hamid Palangi , Alex Polozov , Yichen Huang , and Kazuhito Koishida . 2020 . Neuro-symbolic visual reasoning: Disentangling . In International Conference on Machine Learning. PMLR, 279–290 . Saeed Amizadeh, Hamid Palangi, Alex Polozov, Yichen Huang, and Kazuhito Koishida. 2020. Neuro-symbolic visual reasoning: Disentangling. In International Conference on Machine Learning. PMLR, 279–290.

3. Peter Anderson , Basura Fernando , Mark Johnson , and Stephen Gould . 2016 . Spice: Semantic propositional image caption evaluation. In Computer Vision–ECCV 2016: 14th European Conference , Amsterdam, The Netherlands , October 11-14, 2016, Proceedings, Part V 14. Springer , 382–398. Peter Anderson, Basura Fernando, Mark Johnson, and Stephen Gould. 2016. Spice: Semantic propositional image caption evaluation. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part V 14. Springer, 382–398.

4. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering