Funder
Vietnam National University Ho Chi Minh City
Publisher
Springer Science and Business Media LLC
Reference46 articles.
1. Zhong, M., Zhang, H., Wang, Y., Xiong, H.: Bitransformer: augmenting semantic context in video captioning via bidirectional decoder. Mach. Vis. Appl. 33(5), 77 (2022)
2. You, Q., Jin, H., Wang, Z., Fang, C., Luo, J.: Image captioning with semantic attention. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4651–4659 (2016)
3. Jiang, W., Ma, L., Jiang, Y.-G., Liu, W., Zhang, T.: Recurrent fusion network for image captioning. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 499–515 (2018)
4. Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2015)
5. Xu, K., Ba, J., Kiros, R., et al.: Show, attend and tell: Neural image caption generation with visual attention. In: International Conference on Machine Learning, PMLR, pp. 2048–2057 (2015)