Author:
Chen Feng,Li Xinyi,Tang Jintao,Li Shasha,Wang Ting
Publisher
Springer Nature Switzerland
Reference28 articles.
1. Anderson, P., et al.: Bottom-up and top-down attention for image captioning and visual question answering. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6077–6086 (2018)
2. Antol, S., Agrawal, A., Lu, J., Mitchell, M., Parikh, D.: VQA: visual question answering. Int. J. Comput. Vis. 123(1), 4–31 (2015)
3. Dong, G., Zhang, X., Lan, L., Wang, S., Luo, Z.: Label guided correlation hashing for large-scale cross-modal retrieval. Multimed. Tools Appl. 78(21), 30895–30922 (2019). https://doi.org/10.1007/s11042-019-7192-5
4. Feng, Y., Chen, X., Lin, B.Y., Wang, P., Yan, J., Ren, X.: Scalable multi-hop relational reasoning for knowledge-aware question answering. In: Conference on Empirical Methods in Natural Language Processing (2020)
5. Gao, L., Fan, K., Song, J., Liu, X., Xu, X., Shen, H.T.: Deliberate attention networks for image captioning. In: AAAI Conference on Artificial Intelligence (2019)