CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations-Reference-Cited by-同舟云学术

CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations

Published:2022 Issue: Volume: Page:69-88
ISSN:0302-9743
Container-title:xxAI - Beyond Explainable AI
language:
Short-container-title:

Author:

Salewski Leonard^ORCID,Koepke A. Sophia^ORCID,Lensch Hendrik P. A.^ORCID,Akata Zeynep^ORCID

Abstract

AbstractProviding explanations in the context of Visual Question Answering (VQA) presents a fundamental problem in machine learning. To obtain detailed insights into the process of generating natural language explanations for VQA, we introduce the large-scale CLEVR-X dataset that extends the CLEVR dataset with natural language explanations. For each image-question pair in the CLEVR dataset, CLEVR-X contains multiple structured textual explanations which are derived from the original scene graphs. By construction, the CLEVR-X explanations are correct and describe the reasoning and visual information that is necessary to answer a given question. We conducted a user study to confirm that the ground-truth explanations in our proposed dataset are indeed complete and relevant. We present baseline results for generating natural language explanations in the context of VQA using two state-of-the-art frameworks on the CLEVR-X dataset. Furthermore, we provide a detailed analysis of the explanation generation quality for different question and answer types. Additionally, we study the influence of using different numbers of ground-truth explanations on the convergence of natural language generation (NLG) metrics. The CLEVR-X dataset is publicly available athttps://github.com/ExplainableML/CLEVR-X.

Publisher

Springer International Publishing

Link

https://link.springer.com/content/pdf/10.1007/978-3-031-04083-2_5

Reference60 articles.

1. Agrawal, A., Batra, D., Parikh, D.: Analyzing the behavior of visual question answering models. In: EMNLP, pp. 1955–1960. Association for Computational Linguistics (2016)

2. Agrawal, A., Batra, D., Parikh, D., Kembhavi, A.: Don’t just assume; look and answer: overcoming priors for visual question answering. In: CVPR, pp. 4971–4980 (2018)

3. Ahn, L.V., Blum, M., Hopper, N.J., Langford, J.: CAPTCHA: using hard AI problems for security: In: Biham, E. (eds.) EUROCRYPT 2003. LNCS, vol. 2656, pp. 294–311. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-39200-9_18

4. Anderson, P., et al.: Bottom-up and top-down attention for image captioning and visual question answering. In: CVPR, pp. 6077–6086 (2018)

5. Antol, S., et al.: VQA: Visual Question Answering. In: ICCV, pp. 2425–2433 (2015)

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. ViCLEVR: a visual reasoning dataset and hybrid multimodal fusion model for visual question answering in Vietnamese;Multimedia Systems;2024-07-06

2. Zero-Shot Translation of Attention Patterns in VQA Models to Natural Language;Lecture Notes in Computer Science;2024

3. Study on the Helpfulness of Explainable Artificial Intelligence;Communications in Computer and Information Science;2024

4. MERGE: Multi-Entity Relational Reasoning Based Explanation in Visual Question Answering;2023 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech);2023-11-14

5. Explainable Artificial Intelligence (XAI): What we know and what is left to attain Trustworthy Artificial Intelligence;Information Fusion;2023-11