Publisher
Springer Science and Business Media LLC
Reference46 articles.
1. Agrawal, A., Batra, D., & Parikh, D. (2016). Analyzing the behavior of visual question answering models. In EMNLP 2016—conference on empirical methods in natural language processing 2016 (pp. 1955–1960).
2. Anjum, T., & Khan, N. (2023). CALText: Contextual attention localization for offline handwritten text. Neural Processing Letters. https://doi.org/10.1007/s11063-023-11258-5
3. Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Zitnick, C. L., & Parikh, D. (2015). VQA: Visual question answering. In Proceedings of 2015 international conference on computer vision, ICCV 2015 (pp. 2425–2433).
4. Ba, J. L., & Mnih, V., & Kavukcuoglu, K. (2015). Multiple object recognition with visual attention. In 3rd International conference on learning representations, ICLR 2015—conference track proceedings (pp. 1–10).
5. Biten, A. F., Tito, R., Mafla, A., Gomez, L., Rusino, M., Valveny, E., Jawahar, C. V., & Karatzas, D. (2019). Scene text visual question. In International conference on computer vision (ICCV 2019).