1. Bottom-up and top-down attention for image captioning and visual question answering;Anderson,2018
2. Show, observe and tell: Attribute-driven attention model for image captioning;Chen,2018
3. SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning;Chen,2017
4. Meteor universal: language specific translation evaluation for any target language;Denkowski,2014
5. From captions to visual concepts and back;Fang,2015