1. C. Xiong, J. Lu, D. Parikh, R. Socher., Knowing when to look: adaptive attention via a visual sentinel for image captioning, arXiv preprint 2016 arXiv:1612.01887.
2. Image captioning with semantic attention.;You,2016
3. Show, attend and tell: Neural image caption generation with visual attention.;Xu,2015
4. P. Anderson, X. He, C. Buehler, D. Teney, M. Johnson, S. Gould, L. Zhang., Bottom-up and top-down attention for image captioning and visual question answering., arXiv preprint arXiv:1707.07998v2.
5. Describing videos using multi-modal fusion.;Jin;ACM Multim. Conf.,2016