Funder
National Natural Science Foundation of China
Shanghai Municipality Science and Technology Commission
Shanghai Municipal Human Resources and Social Security Bureau
Shanghai Municipality
Reference46 articles.
1. S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C.L. Zitnick, D. Parikh, Vqa: Visual question answering, in: Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 2425–2433.
2. X. Yu, H. Zhang, Y. Song, Y. Song, C. Zhang, What you see is what you get: Visual pronoun coreference resolution in dialogues, in: Proc. Conf. Empirical Methods Nat. Lang. Process. Int. Joint Conf. Nat. Lang. Process., 2019, pp. 5123–5132.
3. Devise: A deep visual-semantic embedding model;Frome;Adv. Neural Inf. Process. Syst.,2013
4. Image retrieval from remote sensing big data: A survey;Li;Inf. Fusion,2021
5. A. Das, S. Kottur, K. Gupta, A. Singh, D. Yadav, J.M. Moura, D. Parikh, D. Batra, Visual dialog, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 326–335.