Complex visual question answering based on uniform form and content-Reference-Cited by-同舟云学术

Complex visual question answering based on uniform form and content

Published:2024-03 Issue:6 Volume:54 Page:4602-4620
ISSN:0924-669X
Container-title:Applied Intelligence
language:en
Short-container-title:Appl Intell

Author:

Chen Deguang,Chen Jianrui^ORCID,Fang Chaowei,Zhang Zhichao

Funder

National Natural Science Foundation of China

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s10489-024-05383-4.pdf

Reference63 articles.

1. Antol S, Agrawal A, Lu J, Mitchell M, Batra D, Zitnick CL, Parikh D (2015) VQA: visual question answering. In: Proceedings of the IEEE international conference on computer vision. pp 2425–2433

2. Pezeshkpour P, Chen L, Singh S (2018) Embedding multimodal relational data for knowledge base completion. In: Empirical methods in natural language processing (EMNLP). pp 3208–3218

3. Perez E, Strub F, De Vries H, Dumoulin V, Courville A (2018) Film: visual reasoning with a general conditioning layer. In: Proceedings of the AAAI conference on artificial intelligence, vol 32, no 1, pp 3942–3951

4. Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Empirical methods in natural language processing (EMNLP)

5. Lu J, Batra D, Parikh D, Lee S (2019) VilBERT: pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. Advances in neural information processing systems vol 32

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A text mining-based approach for comprehensive understanding of Chinese railway operational equipment failure reports;2024-08-12