Publisher
Springer Nature Switzerland
Reference62 articles.
1. Antol, S., et al.: VQA: visual question answering. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 2425–2433. IEEE, Santiago, December 2015. https://doi.org/10.1109/ICCV.2015.279, http://ieeexplore.ieee.org/document/7410636/
2. Aytar, Y., Castrejon, L., Vondrick, C., Pirsiavash, H., Torralba, A.: Cross-modal scene networks. IEEE Trans. Pattern Anal. Mach. Intell. 40(10), 2303–2314 (2017). Publisher: IEEE
3. Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv:1607.06450 [cs, stat], July 2016. http://arxiv.org/abs/1607.06450, arXiv: 1607.06450
4. Lecture Notes in Computer Science;E Bassani,2022
5. Bugliarello, E., Cotterell, R., Okazaki, N., Elliott, D.: Multimodal pretraining unmasked: a meta-analysis and a unified framework of vision-and-language BERTs. Trans. Assoc. Comput. Linguist. 9, 978–994 (2021). https://doi.org/10.1162/tacl_a_00408
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献