Author:
Xin Bowen,Xu Ning,Zhai Yingchen,Zhang Tingting,Lu Zimu,Liu Jing,Nie Weizhi,Li Xuanya,Liu An-An
Funder
National Natural Science Foundation of China
China Postdoctoral Science Foundation
National Key Research and Development Program of China
Publisher
Springer Science and Business Media LLC
Subject
Computer Networks and Communications,Hardware and Architecture,Media Technology,Information Systems,Software
Reference223 articles.
1. Aafaq, N., Akhtar, N., Liu, W., et al.: Spatio-temporal dynamics and semantic attribute enriched visual encoding for video captioning. In: CVPR, pp. 12,487–12,496 (2019)
2. Anderson, P., Fernando, B., Johnson, M., et al.: SPICE: semantic propositional image caption evaluation. In: ECCV, pp. 382–398 (2016)
3. Anderson, P., He, X., Buehler, C., et al.: Bottom-up and top-down attention for image captioning and visual question answering. In: CVPR, pp. 6077–6086 (2018)
4. Aneja, J., Agrawal, H., Batra, D., et al.: Sequential latent spaces for modeling the intention during diverse image captioning. In: ICCV, pp. 4260–4269 (2019)
5. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR (2015)