Funder
China Scholarship Council
National Key Research and Development Program of China
National Natural Science Foundation of China
Reference49 articles.
1. A. Karpathy, F.F. Li, Deep visual-semantic alignments for generating image descriptions, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3128–3137.
2. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, L. Kaiser, I. Polosukhin, Attention is all you need, in: Proceedings of the International Conference on Neural Information Processing Systems, 2017, pp. 6000–6010.
3. G. Li, L. Zhu, P. Liu, Y. Yang, Entangled Transformer for image captioning, in: Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 8927–8936.
4. M. Cornia, M. Stefanini, L. Baraldi, R. Cucchiara, Meshed-memory transformer for image captioning, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 10575–10584.
5. Y. Luo, J. Ji, X. Sun, L. Cao, Y. Wu, F. Huang, C.-W. Lin, R. Ji, Dual-Level Collaborative Transformer for Image Captioning, in: Proceedings of the AAAI Conference on Artificial Intelligence, 2021, pp. 2286–2293.
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献