Funder
National Natural Science Foundation of China
Publisher
Springer Science and Business Media LLC
Reference63 articles.
1. Antol S, Agrawal A, Lu J, Mitchell M, Batra D, Zitnick CL, Parikh D (2015) VQA: visual question answering. In: Proceedings of the IEEE international conference on computer vision. pp 2425–2433
2. Pezeshkpour P, Chen L, Singh S (2018) Embedding multimodal relational data for knowledge base completion. In: Empirical methods in natural language processing (EMNLP). pp 3208–3218
3. Perez E, Strub F, De Vries H, Dumoulin V, Courville A (2018) Film: visual reasoning with a general conditioning layer. In: Proceedings of the AAAI conference on artificial intelligence, vol 32, no 1, pp 3942–3951
4. Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Empirical methods in natural language processing (EMNLP)
5. Lu J, Batra D, Parikh D, Lee S (2019) VilBERT: pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. Advances in neural information processing systems vol 32
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献