Enhancing Multi-modal Multi-hop Question Answering via Structured Knowledge and Unified Retrieval-Generation-Reference-Cited by-同舟云学术

Enhancing Multi-modal Multi-hop Question Answering via Structured Knowledge and Unified Retrieval-Generation

Published:2023-10-26 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the 31st ACM International Conference on Multimedia
language:
Short-container-title:

Author:

Yang Qian¹^ORCID,Chen Qian²^ORCID,Wang Wen²^ORCID,Hu Baotian¹^ORCID,Zhang Min¹^ORCID

Affiliation:

1. Harbin Institute of Technology, Shenzhen, China

2. Unaffiliated, Hangzhou, China

Funder

Strategic Emerging Industry Development Special Funds of Shenzhen

Natural Science Foundation of China

Publisher

ACM

Link

https://dl.acm.org/doi/pdf/10.1145/3581783.3611964

Reference40 articles.

1. Akari Asai , Kazuma Hashimoto , Hannaneh Hajishirzi , Richard Socher , and Caiming Xiong . 2019 . Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering . In International Conference on Learning Representations. Akari Asai, Kazuma Hashimoto, Hannaneh Hajishirzi, Richard Socher, and Caiming Xiong. 2019. Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering. In International Conference on Learning Representations.

2. WebQA: Multihop and Multimodal QA

3. Wenhu Chen , Hexiang Hu , Xi Chen , Pat Verga , and William Cohen . 2022. MuRAG: Multimodal Retrieval-Augmented Generator for Open Question Answering over Images and Text. (Dec . 2022 ), 5558--5570. Wenhu Chen, Hexiang Hu, Xi Chen, Pat Verga, and William Cohen. 2022. MuRAG: Multimodal Retrieval-Augmented Generator for Open Question Answering over Images and Text. (Dec. 2022), 5558--5570.

4. Alexey Dosovitskiy , Lucas Beyer , Alexander Kolesnikov , Dirk Weissenborn , Xiaohua Zhai , Thomas Unterthiner , Mostafa Dehghani , Matthias Minderer , Georg Heigold , Sylvain Gelly , Jakob Uszkoreit , and Neil Houlsby . 2021 . An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale . In International Conference on Learning Representations. Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations.

5. Dheeru Dua , Yizhong Wang , Pradeep Dasigi , Gabriel Stanovsky , Sameer Singh , and Matt Gardner . 2019 . DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs . In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 2368--2378. Dheeru Dua, Yizhong Wang, Pradeep Dasigi, Gabriel Stanovsky, Sameer Singh, and Matt Gardner. 2019. DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 2368--2378.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. UniRaG: Unification, Retrieval, and Generation for Multimodal Question Answering With Pre-Trained Language Models;IEEE Access;2024