Author:
Ding Yihao,Luo Siwen,Chung Hyunsuk,Han Soyeon Caren
Publisher
Springer Nature Switzerland
Reference35 articles.
1. Antol, S., et al.: Vqa: visual question answering. In: Proceedings of the IEEE international Conference on Computer Vision, pp. 2425–2433 (2015)
2. Biten, A.F., et al.: Scene text visual question answering. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4291–4301 (2019)
3. Chaudhry, R., Shekhar, S., Gupta, U., Maneriker, P., Bansal, P., Joshi, A.: Leaf-qa: locate, encode & attend for figure question answering. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3512–3521 (2020)
4. Lecture Notes in Computer Science;B Davis,2021
5. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), pp. 4171–4186 (2019)
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. On Leveraging Multi-Page Element Relations in Visually-Rich Documents;2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC);2024-07-02
2. Survey of Multimodal Medical Question Answering;BioMedInformatics;2023-12-31