Author:
Chen Yen-Chun,Li Linjie,Yu Licheng,El Kholy Ahmed,Ahmed Faisal,Gan Zhe,Cheng Yu,Liu Jingjing
Publisher
Springer International Publishing
Reference50 articles.
1. Alberti, C., Ling, J., Collins, M., Reitter, D.: Fusion of detected objects in text for visual question answering. In: EMNLP (2019)
2. Anderson, P., et al.: Bottom-up and top-down attention for image captioning and visual question answering. In: CVPR (2018)
3. Antol, S., et al.: VQA: visual question answering. In: ICCV (2015)
4. Cao, J., Gan, Z., Cheng, Y., Yu, L., Chen, Y.C., Liu, J.: Behind the scene: revealing the secrets of pre-trained vision-and-language models. arXiv preprint arXiv:2005.07310 (2020)
5. Chen, L., Gan, Z., Cheng, Y., Li, L., Carin, L., Liu, J.: Graph optimal transport for cross-domain alignment. In: ICML (2020)
Cited by
815 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献