1. Anderson P, He X, Buehler C, Teney D, Johnson M, Gould S, Zhang L. Bottom-up and top-down attention for image captioning and visual question answering. CVPR. 2018;3(5):6.
2. Anne HL, et al. Deep compositional captioning: describing novel object categories without paired training data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 1–10.
3. Chen H, Ding G, Lin Z, Zhao S, Ha J. Show, observe and tell: attribute-driven attention model for image captioning. In: IJCAI, 2018, pp. 606–12.
4. Chen M, Ding G, Zhao S, Chen H, Liu Q, Han J. Reference based LSTM for image captioning. In: AAAI, 2017, pp. 3981–87.
5. Chen H, Zhang H, Chen PY, Yi J, Hsieh CJ Show-and-fool: crafting adversarial examples for neural image captioning. arXiv preprint. 2017; arXiv:1712.02051.