Visual question answering based on local-scene-aware referring expression generation-Reference-Cited by-同舟云学术

Visual question answering based on local-scene-aware referring expression generation

Published:2021-07 Issue: Volume:139 Page:158-167
ISSN:0893-6080
Container-title:Neural Networks
language:en
Short-container-title:Neural Networks

Author:

Kim Jung-Jun,Lee Dong-Gyu^ORCID,Wu Jialin,Jung Hong-Gyu^ORCID,Lee Seong-Whan^ORCID

Funder

Institute for Information Communication Technology Planning and Evaluation

Publisher

Elsevier BV

Subject

Artificial Intelligence,Cognitive Neuroscience

Reference50 articles.

1. Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., & Gould, S., et al. (2018). Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6077–6086).

2. Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., & Lawrence Zitnick, C., et al. (2015). VQA: Visual question answering. In Proceedings of the IEEE international conference on computer vision (pp. 2425–2433).

3. Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. In International conference on learning representations (pp. 1–15).

4. Ben-Younes, H., Cadene, R., Cord, M., & Thome, N. (2017). Mutan: Multimodal tucker fusion for visual question answering. In Proceedings of the IEEE international conference on computer vision (pp. 2612–2620).

5. Cadene, R., Ben-Younes, H., Cord, M., & Thome, N. (2019). Murel: Multimodal relational reasoning for visual question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1989–1998).

Cited by 18 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Deep scene understanding with extended text description for human object interaction detection;Expert Systems with Applications;2025-01

2. Exploring refined dual visual features cross-combination for image captioning;Neural Networks;2024-12

3. Multi-modal long document classification based on Hierarchical Prompt and Multi-modal Transformer;Neural Networks;2024-08

4. Improving few-shot relation extraction through semantics-guided learning;Neural Networks;2024-01

5. Deep Scene Understanding with Extended Text Description for Human;2024