1. Deep visual-semantic alignments for generating image descriptions;Karpathy,2015
2. Show, attend and tell: Neural image caption generation with visual attention;Xu,2015
3. Knowing when to look: Adaptive attention via a visual sentinel for image captioning;Lu,2017
4. Bottom-up and top-down attention for image captioning and visual question answering;Anderson,2018
5. Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning;Chen,2017