Oscar: Omni-scale robust contrastive learning for Text-VQA
-
Published:2024-12
Issue:
Volume:255
Page:124785
-
ISSN:0957-4174
-
Container-title:Expert Systems with Applications
-
language:en
-
Short-container-title:Expert Systems with Applications
Author:
Yue JianyuORCID, Bi XiaojunORCID, Chen ZhengORCID
Reference57 articles.
1. Word spotting and recognition with embedded attributes;Almazán;IEEE Transactions on Pattern Analysis and Machine Intelligence,2014 2. Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Zitnick, C. L., et al. (2015). Vqa: Visual question answering. In Proceedings of the IEEE international conference on computer vision (pp. 2425–2433). 3. Bigham, J. P., Jayant, C., Ji, H., Little, G., Miller, A., Miller, R. C., et al. (2010). Vizwiz: nearly real-time answers to visual questions. In Proceedings of the 23nd annual ACM symposium on user interface software and technology (pp. 333–342). 4. Biten, A. F., Tito, R. P., Mafla, A., Gómez, L., Rusiñol, M., Valveny, E., et al. (2019). Scene Text Visual Question Answering. In 2019 IEEE/CVF international conference on computer vision (ICCV) (pp. 4290–4300). 5. Enriching word vectors with subword information;Bojanowski;Transactions of the Association for Computational Linguistics,2017
|
|