Author:
Liu Yang,Tan Ying,Luo Jingzhou,Chen Weixing
Publisher
Springer Nature Singapore
Reference41 articles.
1. Abbasnejad, E., Teney, D., Parvaneh, A., Shi, J., Hengel, A.V.D.: Counterfactual vision and language learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10044–10054 (2020)
2. Antol, S., et al.: VQA: visual question answering. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2425–2433 (2015)
3. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
4. Gao, J., Ge, R., Chen, K., Nevatia, R.: Motion-appearance co-memory networks for video question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6576–6585 (2018)
5. Gao, L., Lei, Y., Zeng, P., Song, J., Wang, M., Shen, H.T.: Hierarchical representation network with auxiliary tasks for video captioning and video question answering. IEEE Trans. Image Process. 31, 202–215 (2022)
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献