1. Show and tell: a neural image caption generator;Vinyals,2015
2. Deep captioning with multimodal recurrent neural networks (M-RNN);Mao,2015
3. R. Kiros, R. Salakhutdinov, R.S. Zemel, Unifying visual-semantic embeddings with multimodal neural language models, arXiv:1411.2539 (2014).
4. Deep visual-semantic alignments for generating image descriptions;Karpathy,2015
5. Describing multimedia content using attention-based encoder–decoder networks;Cho;IEEE TMM,2015