1. P. Anderson, X. He, C. Buehler, D. Teney, M. Johnson, S. Gould, L. Zhang, Bottom-up and top-down attention for image captioning and vqa, arXiv:1707.07998 2(4) (2017) 8.
2. Deep compositional captioning: Describing novel object categories without paired training data;Anne Hendricks,2016
3. Describing video with attention based bidirectional lstm;Bin;IEEE Trans. Cybern.,2018
4. Adaptively attending to visual attributes and linguistic knowledge for captioning;Bin,2017
5. L. Chen, H. Zhang, J. Xiao, L. Nie, J. Shao, W. Liu, T.S. Chua, Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning (2017) 6298–6306.